SPL Simplified SQL Case Details - Grouping & Join
In the development of database application, we often need to face complex SQL computing, such as join computing in Multi-tier grouping. In SQL, grouping must be aggregated at the same time, and object-based associative access can not be performed, so dealing with this kind of problem will be more complex, which can only be achieved by advanced techniques such as nesting multi-layer sub-queries with window functions. The SPL introduced in this article can support real grouping and provide intuitive object-based associative access, which makes it easier to solve such problems.
Grouping and join are often used in practical business. Here is a general example based on actual business to illustrate the specific process of SPL implementation of grouping and join.
Computing objective: To query out-of-stock DVD stores, that is, the stores whose existing DVD copies are less than four categories.
Data structure:
• Branch table, which stores DVD branch information;
• DVD table, which stores the title and category information of DVD. DVD is virtual data. For example, “Transformers 4” is a DVD, but it is not a visible CD-ROM.
• DVD Copy table, which stores multiple copies of DVD. DVD copies are real CD-ROMs and are stored in various branches in physical form. Note: DVDCopy table is associated with Branch table using BranchID field, and with DVD table using DVDID field.
Following are some data examples:
Branch table:
BID |
Street |
City |
B001 |
street1 |
New York |
B002 |
street2 |
Houston |
B003 |
street3 |
LA |
B004 |
street4 |
Lincoln |
DVD table:
DVDID |
Category |
Title |
D001 |
science fiction |
Transformers IV |
D002 |
science fiction |
Transformers II |
D003 |
science fiction |
Guardians of the Galaxy |
D004 |
act |
The Expendables III |
D005 |
sport |
Need for Speed |
D006 |
feature |
Grace of Monaco |
DVDCopy table:
CopyID |
DVDID |
BID |
Status |
LastDateRented |
LastDateReturned |
MemberID |
C000 |
D001 |
B001 |
7/10/2014 |
7/13/2014 |
M001 |
|
C001 |
D004 |
B001 |
7/10/2014 |
7/13/2014 |
M001 |
|
C002 |
D001 |
B001 |
7/10/2014 |
M001 |
||
C003 |
D005 |
B001 |
7/10/2014 |
7/13/2014 |
M003 |
|
C004 |
D006 |
B001 |
7/10/2014 |
7/13/2014 |
M003 |
|
C005 |
D005 |
B002 |
7/10/2014 |
7/13/2014 |
M003 |
|
C006 |
D002 |
B002 |
7/10/2014 |
7/13/2014 |
M006 |
|
C007 |
D002 |
B002 |
7/10/2014 |
7/13/2014 |
M007 |
|
C008 |
D001 |
B002 |
7/10/2014 |
7/13/2014 |
M008 |
|
C009 |
D004 |
B002 |
7/10/2014 |
7/13/2014 |
M009 |
|
C010 |
D005 |
B002 |
7/10/2014 |
7/13/2014 |
M010 |
|
C011 |
D006 |
B002 |
Miss |
7/10/2014 |
7/13/2014 |
M010 |
C000 |
D001 |
B003 |
7/10/2014 |
7/13/2014 |
M001 |
|
C001 |
D004 |
B003 |
7/10/2014 |
7/13/2014 |
M001 |
|
C002 |
D001 |
B003 |
Miss |
7/10/2014 |
M001 |
|
C003 |
D005 |
B003 |
7/10/2014 |
7/13/2014 |
M003 |
Illustration:
1. The calculation results should be some records in the Branch table.
2. If the Status field in DVDCopy table is “Miss”, then the disc is lost. If the LastDateReturned field is empty, the disc loan has not been returned. Obviously, the lost or unreturned discs are not within the calculation range and should be filtered out.
3. Consideration should be given to the fact that some branches may not have records in DVDCopy table, although this is rare.
Resolving ideas:
1. Filter out the existing DVD copies from the DVDCopy table (no loss or loan)
2. Grouping DVDCopy table according to BID, each group is all DVD copies of a store.
3. Find out the DVDs that correspond to the DVD copies in each store, and then calculate the number of categories of DVDs.
4. Search for stores with DVD categories less than 4, and such stores meet the requirements.
5. Find out stores that do not appear in the DVDCopy table. Such stores also meet the requirements.。
6. Merge the two types of stores that meet the requirements
SPL codes:
A |
|
1 |
=Branch=db.query("select * from Branch") |
2 |
=DVD=db.query("select * from DVD") |
3 |
=DVDCopy=db.query("select * from DVDCopy") |
4 |
=DVDCopy.switch(DVDID,DVD:DVDID; BID,Branch:BID) |
5 |
=DVDCopy.select(STATUS!="Miss" && LASTDATERETURNED!=null) |
6 |
=A5.group(BID) |
7 |
=A6.new(~.BID:BonList, ~.(DVDID).id(CATEGORY).count():CatCount) |
8 |
=A7.select(CatCount<4) |
9 |
=A8.(BonList) | (Branch \ A7.(BonList)) |
10 |
>file("shortage.xlsx").xlsexport@t(A9) |
A1-A3: Retrieve data from the database, named as variables Branch, DVD, DVDCopy respectively. The results are as follows:
A4:=DVDCopy.switch(DVDID,DVD:DVDID; BID,Branch:BID)
Use function switch, switch the DVDID field in DVDCopy table to the corresponding records in DVD table, and switch the BID field to the corresponding records in Branch table. This step is the basis for the object-based associative access, and the results of DVDCopy after calculation are as follows:
The light blue font indicates that the field corresponds to a record, which can be viewed after clicking, as follows:
At this point, only the operator "." need be used for the object-based associative access. For example, DVDCopy.(DVDID).(CATEGORY) represents the corresponding DVD category for each DVD copy. DVDCopy.(BID) represents the detailed information of a branch(complete record) corresponding to a DVD copy.
A5:=DVDCopy.select(STATUS!="Miss" && LASTDATERETURNED!=null)
This code is used to filter data, that is, lost and unreturned DVD copies are not within the scope of calculation, the A5 value after filtering is as follows:
A6:=A5.group(BID)
The above code is used to group data in A5 according to BID, each line representing all DVD copies of a store, as follows:
Click on the light blue font to see the members of the group:
As you can see, the function group only groups the data, and doesn’t perform the aggregate calculation at the same time, which is different from the grouping function in SQL. When we need to further process the grouped data rather than simply aggregate, it is more convenient to use the group function of SPL, such as the code in A7.
A7:=A6.new(~.BID:BonList, ~.(DVDID).id(CATEGORY).count():CatCount)
The above code is used to calculate the number of DVD categories that corresponds to each branch. Function new can generate new object A7 based on the data in A6. A7 has two columns: BonList and CatCount. BonList comes directly from the BID column of data in group A6, and CatCount comes from the DVDID column of data in group A6. CatCount's algorithm is divided into three parts: ~.(DVDID) find the DVD records that all DVD copies in each store correspond to; id(CATEGORY) remove duplicate Category from these DVD records; count() calculate the number of Categories. The result is as follows:
That is: Branch B002 has 3 categories of DVD copies, branch B003 has 3 categories of DVD copies, and branch B001 has 4 categories of DVD copies.
A8:A7.select(CatCount<4)
The above code executes the query and finds stores with CatCount less than 4. The results are as follows:
The stores that are out of stock are calculated on the basis of DVDCopy tables. But some stores that are seriously out of stock may not appear on DVDCopy table. For example, all the DVD copies of the store have been lent out, or the store has no DVD copies at all, so it is necessary to merge these stores. The code is as follows:
A9:=A8.(BonList) | (Branch \ A7.(BonList))
In the above code, the operator "|" denotes the union of two datasets (which can be replaced by a union function), and the operator "\" denotes the difference set calculation (which can be replaced by a diff function). A8.(BonList), Branch, A7.(BonList) represents respectively: stores out of stock in DVDCopy table, all the stores, stores that appear in DVDCopy table. The values are respectively:
A9 is the final result of this case, and its value is:
A10:>file("shortage.xlsx").xlsexport@t(A9)
Finally, the result is exported to the excel file shortage.xlsx. Open the file to see the results as follows:
From this example, we can see that SQL lacks an explicit set and cannot represent data sets with variables such as A8 or Branch, so the short SPL code mentioned above must be implemented with several lengthy SQL.
In addition, SPL can be invoked by reporting tools or Java programs. The method of invoking SPL is similar to that of ordinary database. The JDBC interface provided by SPL can return the result of calculation in ResultSet form to the main program of java. Specific methods can refer to relevant documents. 【How to call an SPL script in Java】
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL