Data Analysis Programming from SQL to SPL: Stock Indicator Calculation
Data structure and sample data:
In which, sid is the stock code, tdate is the trading date, and close is the closing price.
1. Calculate the 12-day EMA of the daily closing price
The n-day EMA of a stock on its first listing day is the closing price of that day.
The n-day EMA = (the previous day’s n-day EMA * (n-1) + the current day’s closing price * 2)/ (n+1).
When n is 12, it is called the 12-day EMA, abbreviated as e12.
Now we need to calculate the 12-day EMA of stock 600036 for each trading day.
SQL statement:
with recursive tbl as (
select tdate, close, row_number() over(order by tdate) rn
from stktrade where sid='600036'),
tbl2 as (
select tdate,close, cast(close as decimal(22,16)) e12, rn
from tbl where rn=1
union all
select t1.tdate, t1.close, (t1.close*2+t2.e12*11)/13, t1.rn
from tbl t1 join tbl2 t2 on t1.rn-1=t2.rn
)
select * from tbl2;
The e12 for each day is computed based on the e12 of the preceding day. SQL first uses the window function to manually create a sequence number, and then utilizes recursion to calculate. The processes of understanding, writing, and debugging the SQL code are each quite challenging.
A more natural approach to solve this problem is as follows: Sort the stock data by date; the previous record is then the data from the previous trading day. Add a calculated column e12. Except for the first record where e12 equals close, loop through each record and use the current record’s closing price and the previous record’s e12 to calculate the current row’s e12. This will produce the following table:
Having traversed all the records, the expected result is obtained.
However, SQL’s support for procedural calculations is too weak, making it very difficult to implement this approach without using stored procedures.
SPL offers procedural calculation abilities like ordinary programming languages, making it easy to implement the above approach.
A |
|
1 |
=T("stock.csv").select(sid==600036).sort(tdate) |
2 |
=A1.derive(if(#==1,close,(2*close+11*e12[-1])/13):e12) |
A1: Read stock data from the csv file, filter the data of stock with code 600036, and sort by trading date.
A2: This is to implement the approach described above, calculating the e12 column. The derive here is a looping function that directly loops on the set, eliminating the need to write multi-line statement blocks like for...next.
SPL supports cross-row referencing within looping functions. The e12[-1] in the expression represents the calculated e12 value from the previous row. # represents the current row’s sequence number during the loop calculation.
SPL IDE is highly interactive, allowing for step-by-step execution and intuitively viewing the result of each step in the right-hand panel at any time.
2. Calculate the MACD indicator
MACD is a common trend indicator in stock analysis. The related calculation formulas are as follows:
12-day EMA (Fast Line) = (Previous day’s EMA*11 + Today’s closing price*2)/13
26-day EMA (Slow Line) = (Previous day’s EMA*25 + Today’s closing price*2)/27
DIFF = Today’s 12-day EMA - Today’s 26-day EMA
DEA = (Previous day’s DEA*8 + Today’s DIFF*2)/10
MACD = 2*(DIFF-DEA)
Now we want to calculate the MACD indicator for stock 600036 for each trading day.
SQL statement:
with recursive tbl as(
select tdate, close, row_number() over(order by tdate) rn
from stktrade where sid='600036'
),
tbl2 as(
select tdate,close, cast(close as decimal(22,16)) e12,
cast(close as decimal(22,16)) e26, rn
from tbl where rn=1
union all
select t1.tdate, t1.close, (t1.close*2+t2.e12*11)/13,
(t1.close*2+t2.e26*25)/27, t1.rn
from tbl t1 join tbl2 t2 on t1.rn-1=t2.rn
),
tbl3 as (
select tdate,close,e12, e26, e12-e26 diff,rn
from tbl2
),
tbl4 as (
select tdate,close,e12,e26, diff,diff dea,rn
from tbl3 where rn=1
union all
select t3.tdate, t3.close, t3.e12, t3.e26, t3.diff, (t3.diff*2+t4.dea*8)/10, t3.rn
from tbl3 t3 join tbl4 t4 on t3.rn-1=t4.rn
)
select tdate,close,e12,e26,diff,dea,2*(diff-dea) macd
from tbl4;
With the previous SQL code understood, adding the calculation of these formulas is not difficult, but the code becomes more cumbersome, and the calculation of the DEA necessitates a further recursive step.
SPL continues with the natural approach, simply adding some expressions to the derive function of the previous task and performing one more derive step.
A |
|
1 |
=T("stock.csv").select(sid==600036).sort(tdate) |
2 |
=A1.derive(if(#==1,close,(2*close+11*e12[-1])/13):e12,if(#==1,close,(2*close+25*e26[-1])/27):e26) |
3 |
=A2.derive(e12-e26:diff, if(#==1,diff,(2*diff+8*dea[-1])/10):dea, 2*(diff-dea):macd) |
The code complexity has barely changed, and it is still very simple and easy to understand.
3. Find golden cross and death cross
If the DIFF of a certain trading day’s previous day is less than the DEA, and the DIFF of the following day is greater than the DEA, then that trading day is called a golden cross.
If the DIFF of a certain trading day’s previous day is larger than the DEA, and the DIFF of the following day is less than the DEA, then that trading day is called a death cross.
In the figure below, position 2 is a golden cross, and position 1 is a death cross:
Now we want to find all the golden cross and death cross trading days for stock 600036.
SQL statement:
with recursive tbl as (
select tdate, close, row_number() over(order by tdate) rn
from stktrade
where sid='600036'
),
tbl2 as (
select tdate,close, cast(close as decimal(22,16)) e12,
cast(close as decimal(22,16)) e26, rn
from tbl where rn=1
union all
select t1.tdate, t1.close, (t1.close*2+t2.e12*11)/13,
(t1.close*2+t2.e26*25)/27, t1.rn
from tbl t1 join tbl2 t2 on t1.rn-1=t2.rn
),
tbl3 as (
select tdate,close,e12, e26, e12-e26 diff,rn
from tbl2
),
tbl4 as (
select tdate,close,e12,e26, diff,diff dea,rn
from tbl3 where rn=1
union all
select t3.tdate, t3.close, t3.e12, t3.e26, t3.diff, (t3.diff*2+t4.dea*8)/10, t3.rn
from tbl3 t3 join tbl4 t4 on t3.rn-1=t4.rn
),
tbl5 as (
select tdate,close,e12,e26,diff,dea,2*(diff-dea) macd,
case when rn=1 then 0
when lag(diff) over(order by rn)<lag(dea) over(order by rn) and diff>dea then 1
when lag(diff) over(order by rn)>lag(dea) over(order by rn) and diff<dea then -1
else 0 end `cross`
from tbl4
)
select * from tbl5 where `cross`<>0;
This code adds another layer of subquery and also requires adding window functions, making it longer and more complex.
The SPL code for the previous task has obtained DIFF and DEA, and now we just need one more step to calculate the golden cross and death cross:
A |
|
1 |
=T("stock.csv").select(sid==600036).sort(tdate) |
2 |
=A1.derive(if(#==1,close,(2*close+11*e12[-1])/13):e12,if(#==1,close,(2*close+25*e26[-1])/27):e26) |
3 |
=A2.derive(e12-e26:diff, if(#==1,diff,(2*diff+8*dea[-1])/10):dea, 2*(diff-dea):macd) |
4 |
=A3.derive(if(diff[-1]<dea[-1]&&diff>dea:1,diff[-1]>dea[-1]&&diff<dea:-1;0):cross) |
5 |
=A4.select@a(cross!=0) |
A4: After the steps in A3, perform another loop calculation to obtain the golden cross and death cross flag, cross. The approach is clear.
A5: Use the select@a function to obtain all results that meet the condition cross!=0, that is, all golden crosses and death crosses.
4. Find MACD top divergence
Divide the stock data into different intervals based on the order of trading days. If the maximum closing price (or other indicator) of a later interval is greater than the maximum of the preceding adjacent interval, it is called a closing price (or other indicator) making a new high. If it is less than, it is called not making a new high.
MACD top divergence refers to a situation where, between two adjacent golden cross to death cross intervals, the closing price makes a new high, but the DIFF does not make a new high.
At the arrow position in the chart above, the stock price is still continuously rising, but the MACD indicator is diverging downwards, indicating top divergence.
Now we want to find the last record of the first interval within every pair of intervals exhibiting MACD top divergence for stock 600036.
Because it is too complex to be implemented using SQL, we won’t try.
In the previous task, SPL has already calculated the index positions of golden crosses and death crosses. Based on this, continue to find the interval positions from golden cross to death cross. Then, determine whether the closing price and DIFF of adjacent intervals make new highs:
A |
|
1 |
=T("stock.csv").select(sid==600036).sort(tdate) |
2 |
=A1.derive(if(#==1,close,(2*close+11*e12[-1])/13):e12,if(#==1,close,(2*close+25*e26[-1])/27):e26) |
3 |
=A2.derive(e12-e26:diff, if(#==1,diff,(2*diff+8*dea[-1])/10):dea, 2*(diff-dea):macd) |
4 |
=A3.derive(if(diff[-1]<dea[-1]&&diff>dea:1,diff[-1]>dea[-1]&&diff<dea:-1;0):cross,0:topdev) |
5 |
=A4.pselect@a(cross==1) |
6 |
=A5.(A4.to(A5.~, A4.pselect(cross==-1,A5.~))) |
7 |
>A6.run(if(#>1&&~[-1].max(close)<~.max(close)&&~[-1].max(diff)>~.max(diff),~.m(-1).topdev=1)) |
8 |
=A4.select(topdev==1) |
A4: Prepare a calculated column topdev with a value of 0 to mark target records.
A5: Find the index positions of all golden crosses. SPL has rich positional operation capabilities, and when filtering, it can not only return the members that meet the conditions, but also the positions of these members.
A6: Loop through the golden cross index positions in A5. A4.pselect(cross==-1, A5.~) will obtain the position of the first death cross after the given golden cross. In this way, the calculation result of A6 will be all golden cross to death cross intervals.
Double-click on a member in the result to view its detailed information. For example, the fifth member:
We can see that this is an interval from a golden cross (cross is 1) to a death cross (cross is -1).
A7: Loop through A6, find the adjacent intervals that meet the MACD line divergence condition, and assign the topdev value of the last record of the first interval to 1.
The members of A6 are a record sequence of the table sequence A4, and no newtablesequence isgenerated. Therefore, A7 actually assigns thetopdevof the target records in table sequence A4 to 1.
A8: Filter A4 to obtain the desired result.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version
Click here to download the data file