Structurize a Text File with Regular, Indefinite Lines
【Question】
I have a CSV file with unstandardized content, it goes something like this:
John, 001
01/01/2015, hamburger
02/01/2015, pizza
03/01/2015, ice cream
Mary, 002
01/01/2015, hamburger
02/01/2015, pizza
John, 003
04/01/2015, chocolate
What I'm trying to do is to write a logic in Java to separate them. I would like to make"John, 001"the header and to put all rows under John, before Mary to be John's. Will this be possible? Or can I just do it manually?
For the input, even though it is not standardized, a noticeable pattern is that the row that doesn’t have names will always start with a date. My output goal would be a Java object which I can store in the database in the format below:
Name, hamburger, pizza, ice cream, chocolate
John, 01/01/2015, 02/01/2015, 03/01/2015, NA
Mary, 01/01/2015, 02/01/2015, NA, NA
John, NA, NA, NA, 04/01/2015
【Answer】
A number of structured computations are required to get this task done. As Java isn’t equipped with class library for performing structured computations, the code is very complicated and difficult to understand. You can handle the structured computations with SPL. The code is intuitive and easy to understand:
A |
B |
|
1 |
=file("D:\\noneStand.csv").cursor@c() |
=["hamburger","pizza","ice cream","chocolate"] |
2 |
=create(name,${foodlist}) |
|
3 |
for A1;!isdigit(left(#1,1)) |
=A3.to(2,).align(B1,#2) |
4 |
=A2.record(A3.#1 | B3.(#1)) |
A1: Read in noneStand.csv as a cursor; comma is the separator.
A2: Create a resulting two-dimensional table. In ${foodlist}, foodlist is the parameter, whose value is hamburger,pizza,'ice cream',chocolate. It parses the parameter as an expression.
A3: Loop over A1 to transfer a complete group of data each time to A3. Put a line into the same group unless it starts with a letter. B3,B4 covers the working range of a loop.
B3: Align A3’s data (beginning from the second) with foodlist. Below is the alignment result for Mary’s group:
01/01/2015, hamburger
02/01/2015, pizza
NA,NA
NA,NA
B4: Populate records to A2’s table. A3.#1 returns the first field value of the A3’s first record, like Mary; B3.(#1) is the sequence of values of B3’s first field, which is [01/01/2015, 02/01/2015,NA,NA]. The "|" sign means concatenation.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL