Group by a Sign and Then Regroup for Aggregation
【Question】
I’m new to Java. I want to read in data from a txt file, modify it and write the frequency of each neighboring pair of letters to a new txt file (like ab= times, ba= times, aa= times, etc.). The asterisk sign * and the pound sign # represents the start and the end of a string of letters. It would be best if the code can be directly executed and is beginner-friendly with comments provided. Thanks!
Here’s a part of the source text file (All the file is in this format):
*
a
b
#
*
a
b
b
#
*
a
a
b
c
#
*
a
c
c
b
#
*
d
#
*
a
d
b
a
d
d
c
#
【Answer】
It’s rather complicated to do this in Java. You can handle it in SPL (Structured Process Language) and then integrate it with the Java application:
A |
|
1 |
=file("E:\\s.txt").import@i() |
2 |
=A1.select(~!="#").group@i(~=="*") |
3 |
=A2.conj(~.([~[-1],~]).to(3,)) |
4 |
=A3.groups(~:a;count(~):b) |
5 |
=A4.new(a.concat()+"="+string(b)+"time") |
6 |
=file("E:\\result.txt").export(A5) |
A1: Read in data from s.txt.
A2: Group data by the pound sign “#” with each group starting with the asterisk sign “*”.
A3: For every group, get each member and its previous neighboring member to form a sequence, and get a sequence consisting of the third member and member (s) after it, and then concatenate these sequences.
A4: Group the sequence to find the frequency of each pair of letters.
A5: Generate a new table sequence from A4 according to the required format.
index |
a.concat()+"="+string(b)+"time" |
1 |
aa=1 time |
2 |
ab=3 times |
3 |
ac=1 time |
4 |
ad=2 times |
5 |
ba=1 time |
6 |
bb=1 time |
7 |
bc=1 time |
8 |
cb=1 time |
9 |
cc=1 time |
10 |
db=1 time |
11 |
dc=1 time |
12 |
dd=1 time |
A6: Export A5’s table sequence to a specified text file.
The SPL script can be conveniently integrated into a Java application. See How to Call an SPL Script in Java.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL