Understanding set orientation from the comparison between SQL and Java, and what are the advantages of SQL over Java?
For the same data calculation task, written in SQL or Java, the latter is often several times longer than the former. The long code is not only cumbersome to write, but also not conducive to understanding the overall business logic structure, and the algorithm and process are buried in the details.
Why is Java so much longer than SQL? Let’s answer this question and introduce the concept of set orientation in programming languages.
Firstly, it is about the computing ability of sets, which is easy to understand.
Business logic often faces structured data, which typically appears in batches (in the form of sets). To facilitate the calculation of such data, programming languages need to provide sufficient set computing capabilities.
Java of the early days did not directly provide a set operation library for structured data, and even lacked decent structured data objects. Making a simple count and sum requires writing many lines, and operations like grouping and join are extremely troublesome.
SQL has more sophisticated set operations, such as aggregation operations like SUM/COUNT, WHERE for filtering, and GROUP for grouping, resulting in much shorter code.
So, isn’t it enough to add some library functions for set operations to Java? For example, Streams after Java8 has quite a few of them, is that enough?
It’s not that simple!
Let’s use sorting operation as an example. In SQL, simply write sorting as “ORDER BY price”, and you don’t need to care about the data type of this PRICE.
Java is a type strict compiled language, and the same function cannot work for different data types. Therefore, it is necessary to write sorting functions for different data types separately, with integers, real numbers, and strings being different. It’s OK if it is just a hassle for library function developers, the problem is that users also need to specify the data type so that the compiler can find the function. Java has invented generic syntax to simplify writing, but there are still a bunch of angle brackets in the code that look messy and affect understanding of the business.
Sorting may face multiple parameters, such as “ORDER BY price, quantity” in SQL.
This is another issue for Java. Functions with different numbers of parameters cannot be mixed together, and it is not possible to exhaustively list all possible numbers of parameters in advance, just like when dealing with data types. The usual method is to write a single parameter function, when encountering multiple parameters, temporarily convert them to a single parameter, such as combining price and quantity into one parameter and then sorting it. Alternatively, it can support set parameters, and when referencing, it is necessary to put the parameters into a set form and add an extra layer. Writing it is quite troublesome.
SQL doesn’t have such a matter, interpretive languages can dynamically decide how to do it based on data type and number.
This matter is not over yet, sorting may also be targeted at a calculation formula, such as “ORDER BY price*quantity” in SQL. This “price*quantity” is not calculated before executing this SQL statement, but rather when traversing the set members. Essentially, “price*quantity” is a function that takes the current set members as parameter, which is equivalent to using a function defined by an expression as a parameter for sorting operation.
In Java, if an expression is written as a parameter to a function, it will be calculated before the call, rather than being calculated separately for set members. Java certainly allows passing a function as a parameter to another function, but the writing method is much more complicated and requires defining a function beforehand.
Passing functions as parameter and do not define them in advance, isn’t this Lambda syntax? Doesn’t Java now support it as well?
Yes, Java now has Lambda syntax, which allows anonymous functions to be defined directly in parameter. But obviously it cannot be written as a simple calculation formula, and when the compiler cannot distinguish it, it will directly calculate it. The Lambda syntax is still the same as regular functions, and it’s necessary to define parameters or even types, there is also an obvious function body, only that it is not named. Moreover, due to the frequent confusion between the data type and number of parameters mentioned earlier and this Lambda syntax, the code becomes even more disorganized.
SQL makes Lambda syntax unnoticeable, and no one even calls it Lambda syntax, but it does indeed define a function using a calculation formula and use it as a parameter.
Moreover, structured data is not simply a single value, but records with fields.
When referencing record fields in SQL expression parameters, in most cases, the field name can be used directly without specifying the table in which the field is located. Only when there are multiple fields with the same name, the table name (or table alias) needs to be prefixed to distinguish them.
The Lambda syntax of Java does not naturally know how to recognize records. To it, it is just a parameter, and to retrieve the field of a record (i.e., member of a class) needs to use the dot operator. If the parameter name representing the current member is “it”, it needs to be written in the verbose form of “it.price*it.quantity”.
Only with a syntax mechanism for directly referencing fields can it be considered a professional language for structured data computation.
Once the operation changes from targeting single values to targeting sets, especially for structured data sets, it becomes much more troublesome.
Is that over yet? Not really. SQL also supports dynamic data structures.
In structured data computing, the calculation results are often structured data, and its structure is related to operations and cannot be prepared before coding. So the ability to support dynamic data structures is needed.
Any SELECT statement in SQL will generate a new data structure, and fields can be added or removed freely in the code without the need to define the structure (class) beforehand. Compiled languages like Java can’t implement this. Before compiling the code, all the structures (classes) used must be defined, and in principle, new structures cannot be dynamically generated during execution.
Another approach is to define a complex class that can be used to describe dynamic structures, where field names and values are treated as array members. However, this is no longer a Java style class, and member references cannot be simply done using dots, but need to call functions, which is very inconvenient. Unless a new language is designed that uses this object as the basic data type, then it can be easily used.
Let’s summarize: a set operation class library, where the types and quantities of parameters can be dynamic; Intangible Lambda syntax, where fields of records can be directly referenced; Dynamic data structure.
These are what we commonly refer to as the set-oriented features of programming languages! With these set-oriented features, it is convenient to process batch structured data.
These seemingly obvious syntax systems in SQL are actually not as simple behind them, which requires careful design.
By this standard, Java itself is not suitable, and no matter how many class libraries are added to Java, it’s still not suitable; the new languages Kotlin and Scala designed based on Java also do not meet this standard. A crucial aspect of this lies in the intangible Lambda syntax, which can only be achieved in interpreted languages. The interpreter then knows whether the calculation formula at a certain position should be calculated first or be treated as a function and pass it in, while compiled languages need to be written as strings or distinguished by some symbol system, which can damage the conciseness of the code. So, these programming languages cannot be as concise as SQL in any way.
esProc SPL can! It is an interpreted language designed based on Java that satisfies all of the aforementioned set-oriented features.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version