String Split – Count the Frequency of a Certain String
【Question】
ID text
1 &aaa&&bbb&
2 &aaa&&bbb&&123&
3 &aaa&&ccc&&456&
4 &aaa&&ddd&&eee&&222&
5 &aaa&&ddd&&eee&&789&
I want to find how many times a certain string appears. The string will start a value and appear at least twice. Here’s what I want:
&aaa& 5 times
&aaa&&bbb& 2 times
&aaa&&ddd& 2 times
&aaa&&ddd&&eee& 2 times
The basic format of a string is &[a-zA-Z0-9]&. There can be multiple such strings, like &[a-zA-Z0-9]&&[a-zA-Z0-9]&.
【Answer】
The key to you question is string split. A stored procedure is a little easier than a SQL query. But both are complicated. I handle it in SPL (Structure Process Language):
A |
|
1 |
$select text from tb |
2 |
=A1.(text).(replace(~,"&&","&,&").array()) |
3 |
=A2.conj().group(~:str;~.count():num).select(num>1) |
A2: Split each text value into a sequence where each member is a &[a-zA-Z0-9]&;
A3: Loop over each letter to repeatedly join it up to the “&” sign, which is the resulting string, and then group the original data by the resulting string and find the group whose member is greater than 1. Here’s the final result:
An SPL script can be integrated into another application via esProc JDBC. Refer to How to Call an SPL Script in Java for details.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL