Find Missing Datetimes from a CSV-format Time Series with Fixed Interval
Problem description & analysis
Below is part of the data in CSV file csv.csv:
2021-02-02 00:00:00
2021-02-02 01:00:00
2021-02-02 02:00:00
2021-02-02 03:00:00
2021-02-02 04:00:00
2021-02-02 05:00:00
2021-02-02 06:00:00
2021-02-02 07:00:00
2021-02-02 08:00:00
2021-02-02 10:00:00
...
The whole data is from 2021-02-02 00:00:00 to 2021-03-03 23:00:00 and has been ordered at an interval of one hour. There are altogether 30 days of data and, in theory, 24 records each day. Actually, there are certain records are left out. Our task is to find those missing records. Below is the desired result:
2021-02-02 09:00:00
2021-02-02 14:00:00
2021-02-03 06:00:00
2021-02-03 09:00:00
2021-02-03 11:00:00
2021-02-03 17:00:00
2021-02-03 18:00:00
2021-02-03 19:00:00
2021-02-03 20:00:00
2021-02-03 23:00:00
...
Solution & explanation
Write the following script p1.dfx in esProc script:
A |
|
1 |
=periods@s("2021-02-02 00:00:00","2021-03-03 23:00:00",3600) |
2 |
=file("csv.csv").import@ci() |
3 |
=A1\A2 |
Explanation:
A1 List all dates and times between 2021-02-02 00:00:00 and 2021-03-03 23:00:00 at the interval of one hour.
A2 Import the date and time data from the CSV file.
A3 Calculate difference between A1 and A2.
Read How to Call an SPL Script in Java to learn how to integrate the script code into a Java program.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL