Handle Commas inside a Quotation Mark
【Question】
I am trying to parse a comma-separated string using:
val array = input.split(",")
Then I notice that some input lines have "," inside a quotation mark:
data0, "data1", data2, data3, "data4-1, data4-2, data4-3", data5
*Note that the data is not very clean, so some fields are inside quotation marks while some don't.
How do I split such line into:
array(0) = data0
array(1) = data1
array(2) = data2
array(3) = data3
array(4) = data4-1, data4-2, data4-3
array(5) = data5
【Answer】
The key to this problem is identifying commas outside quotation marks, instead of those inside them, as separators. It’s OK to handle it in Java, but the code is complicated. Since there are no other computing targets in this case, we can do it in SPL (Structured Process Language) and then embed the script into Java. One-liner is enough:
A |
|
1 |
=file("d:\\source.txt").import@qc() |
A1: According to the commas, f.import@c() function reads the text file source.txt as a two-dimensional table and remove the quotation marks automatically with @q function. Here’s the result:
If the source of the to-be-handled string is a variable (say str), A1’s code should be =str.import@qc(). Take the following string as an example:
data0, "data1", data2, data3, "data4-1, data4-2, data4-3", data5
"data0, data0", data1, "data2", data3-1, "data4-2", data5
Then the result is:
An SPL script is easily integrated into a Java application. (See How to Call an SPL Script in Java)
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL