Simplifying Join Syntax with Dimension Alignment
For the example of aligning two child tables to the parent table cited in the previous article, the following is the SQL query:
SELECT Orders.id, Orders.customer, A.x, B.y FROM Orders
LEFT JOIN (SELECT id,SUM(price) x FROM OrderDetail GROUP BY id ) A ON Orders.id=A.id
LEFT JOIN (SELECT id,SUM(amount) y FROM OrderPayment GROUP BY id ) B ON Orders.id=B.id
WHERE A.x > B.y
Obviously this JOIN is useful in real-world queries. But under which type it should be classified?
It involves the Orders table and two subqueries – A and B. Each subquery has a GROUP BY id clause, which means the primary key of the result set will be the id field. Now the three tables (a subquery can be considered a temporary table) involved in the JOIN have the same primary key. They are the homo-dimension tables associated by a one-to-one mapping. That fits into our types.
But the syntax used to simplify the JOIN queries in the previous article can’t be applied to this JOIN query over homo-dimension tables, because both subqueries cannot be omitted.
The prerequisite for a simplifiable JOIN query is that the relationship between the to-be-joined tables needs to be already defined in the whole data structure. In technical terms, we should know the definition of the database metadata. It is unlikely that an ad hoc subquery can be predefined in the metadata, so the table to be joined (a subquery) should be specified.
While the temporary tables to be JOINed cannot be omitted, but since the related fields, which are primary keys, are already specified in GROUP BY, it is unnecessary to write them again in the join conditions. Since the grouping fields, which are also the primary keys defined by GROUP BY in the subqueries, must be selected for performing the outer JOINs, there’s no need to SELECT them when the tables are already GROUPed BY them. Moreover, the child tables over which the subqueries are performed are independent of one another and won’t be associated. We can put both the GROUP and aggregate operations in the main query to get rid of a layer of subquery:
SELECT Orders.id, Orders.customer, OrderDetail.SUM(price) x, OrderParyment.SUM(amount) y
FROM Orders LEFT JOIN OrderDetail GROUP BY id LEFT JOIN OrderPayment GROUP BY id
WHERE A.x > B.y
The join we are performing is far from the JOINs defined in SQL. There’s no trace of Cartesian product in it. And unlike a SQL JOIN that joins two tables, this join aligns tables – OrderDetail, OrderPayment and Orders – to a common primary key field, the id field. All the tables are aligned against a certain base dimension. As they have different dimensions (primary keys), the GROUP BY action may happen during the process, leading to an aggregate operation when referring to a field of the table being grouped. There are no direct associations between the OrderDetail table, the OrderPayment table, and even the Orders table. So there is no need to care about their relationships, or whether there is another table to be joined with the current table. In SQL, the Cartesian product-based JOIN requires at least two tables to define the associated operation. Any change to an involved table or its deletion requires the handling of the matching table, making the query hard to understand.
Ours is the dimension alignment join. Though still within the three types previously defined, it has a different syntax. Unlike the case in SQL, here the JOIN key word is more a conjunction than a verb. The FULL JOIN isn’t a rare thing for the dimension alignment, whereas it is impossible or scarcely seen with the three basic join types.
Though an example of the parent and child tables is used to explain the dimension alignment join, it isn’t the exclusive scenario to which this kind of join is applied (According to the simplified syntax for a join between the parent and child tables, the query should not be so complicated). The dimension alignment join syntax can be applied to any tables to be associated, without the requirement that the related fields be the primary keys or a part of them.
Here are the Contract table, the Payment table and the Invoice table:
The Contract table, with the following fields:
id
date
customer
price
…
The Payment table, with the following fields:
seq
date
source
amount
…
The Invoice table, with the following fields:
code
date
customer
amount
…
To find each day’s contract amount, payment amount, and invoice amount, we can write the query like this:
SELECT Contract.SUM(price), Payment.SUM(amount), Invoice.SUM(amount)
FROM Contract GROUP BY date FULL JOIN Payment GROUP BY date FULL JOIN Invoice GROUP BY date
We can also use a mix of the join types:
Along with the above Contract table, here are the Customer table and the Sellers table:
The Customer table, with the following fields:
id
name
area
…
The Sellers table, with the following fields:
id
name
area
…
The customer field in the Contract table is the foreign key pointing to the Customer table.
To find the number of salespeople and the contract amount in each area:
SELECT Sellers.COUNT(1), Contract.SUM(price)
FROM Sales GROUP BY area FULL JOIN Contract GROUP BY customer.area
The query uses both the dimension alignment join syntax and the foreign key attributization syntax.
In these examples, all the joins are finally become the homo-dimension type. Though uncommon, a dimension alignment join can be of parent-child-table type. We’ll look at it when discussing the concept of dimension. Finally, the above dimension alignment join syntax isn’t complete. It needs a clear definition of dimension to make it perfect.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL