"With a huge amount of data, create a new group when the next grouping field value becomes differ .."

mars RaqForum 25 No.
253 View • 2 Years ago

6.16 Order-based grouping: by continuous same value – big data

With a huge amount of data, create a new group when the next grouping field value becomes different and then summarize data in each group.
We have a large log file where logs are output according to datetime. The task is to find the date when the ERROR log level appears the most.

Date	Time	Level	IP	…
2020/1/1	0:00:01	INFO	166.253.153.234	…
2020/1/1	0:00:02	INFO	99.72.133.239	…
2020/1/1	0:00:04	WARM	99.11.105.39	…
2020/1/1	0:00:05	INFO	117.69.80.195	…
2020/1/1	0:00:11	INFO	79.195.137.228	…
…	…	…	…	…

SPL has cs.group() function to group a huge number of records, during which it creates a new group whenever the next neighboring value in the grouping field changes.

SPL script:

	A
1	=file(“ServerLog.txt”).cursor@t()
2	=A1.group(Date,Level;count(~):Count)
3	=A2.select(Level:“ERROR”)
4	=A3.top(1;ErrorCount)

A1 Create cursor for the log file.
A2 Use cs.group() function to perform grouping where it generates a new group whenever the date and log level in the next neighboring record change.
A3 Get groups of log level ERROR.
A4 Get the group containing the largest number of continuous ERROR level.

Execution result:

Date	ErrorCount
2020/01/02	4

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

spl-cookbook(224)

eBook

mars • 253 View • 2 Years ago