"With a huge amount of data, create a new group when the next grouping field value becomes differ .."

mars RaqForum 25 No.
270 View • 2 Years ago

6.16 Order-based grouping: by continuous same value – big data

With a huge amount of data, create a new group when the next grouping field value becomes different and then summarize data in each group.
We have a large log file where logs are output according to datetime. The task is to find the date when the ERROR log level appears the most.

Date	Time	Level	IP	…
2020/1/1	0:00:01	INFO	166.253.153.234	…
2020/1/1	0:00:02	INFO	99.72.133.239	…
2020/1/1	0:00:04	WARM	99.11.105.39	…
2020/1/1	0:00:05	INFO	117.69.80.195	…
2020/1/1	0:00:11	INFO	79.195.137.228	…
…	…	…	…	…

SPL has cs.group() function to group a huge number of records, during which it creates a new group whenever the next neighboring value in the grouping field changes.

SPL script:

	A
1	=file(“ServerLog.txt”).cursor@t()
2	=A1.group(Date,Level;count(~):Count)
3	=A2.select(Level:“ERROR”)
4	=A3.top(1;ErrorCount)

A1 Create cursor for the log file.
A2 Use cs.group() function to perform grouping where it generates a new group whenever the date and log level in the next neighboring record change.
A3 Get groups of log level ERROR.
A4 Get the group containing the largest number of continuous ERROR level.

Execution result:

Date	ErrorCount
2020/01/02	4

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

spl-cookbook(224)

eBook

mars • 270 View • 2 Years ago