9.19 Parse text where line count in a record varies according to a regular expression
Parse a text file having multiple groups that contain different numbers of lines according to a regular expression.
We have a log file where each group of lines may have a different number of lines and trying to parse it as structured data. Below is part of the file:
A.regex(rs,Fi) searches for a match among string members of sequence A according to regular expression rs and returns result as a table sequence consisting of Fi fields.
SPL script:
A | |
---|---|
1 | =file(“report.log”).read() |
2 | =A1.split(“Object Type:”).delete(1) |
3 | =A2.regex(“(.+)[\\s\\S]+left:(.+)[\\s\\S]+top:(.+)[\\s\\S]+right:(.+)[\\s\\S]+bottom:(.+)[\\s\\S]+Line Color:(.+)[\\s\\S]+Fill Color:\\t\\t(.+)[\\S\\s]+Link:(.+)[\\s\\S]+Type: (.+)[\\s\\S]+Condition Type:(.+)[\\s\\S]+Statement:\\s+(.+)[\\s\\S]+Link:(.+)[\\s\\S]+Type: (.+)[\\s(\\S]+Expression :(.+)”;ObjectType,left,top,right,bottom,lineColor,fillColor,ojbectLink,type,conditionType,statement,statementLink,statementType,lastExpress) |
4 | =file(“result.txt”).export@t(A3) |
A1 Read the log file and return it as a string.
A2 Split text content into multiple records according to the mark “Object Type:” and discard the first record.
A3 Search matches in each of A2’s members according to the specified regular expression and piece them together as a record.
A4 Export A3’s result to result.txt.
Execution result:
ID | ObjectType | left | top | right | bottom | lineColor | fillColor | … |
---|---|---|---|---|---|---|---|---|
1 | Symbol | 695 | 51 | 723 | 75 | RGB (0 0 0) | RGB (255 255 0) | … |
… | … | … | … | … | … | … | … | … |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL