Can Stream and Kotlin Replace SQL in Java Applications?
SQL has good computing ability. It is widely used during Java development to handle structured data. Yet the language is tightly bounded with databases and has architectural defects, resulting in code migration difficulty, non-hot-swappable business logic, and cost-ineffective efforts in increasing performance. Today’s application frameworks prefer to implement business logics directly in Java while using the database only for data persistence. This calls for techniques for computing structured data outside databases.
Java in its early days didn’t offer class libraries for structured data computations. Hardcoding was the only choice and it was really hard. Later the high-level language released Stream library and then Kotlin, the reformed version of Stream. The latter much facilitated the process of computing structured data with its Lambda expression, stream-style programming and set functions. After many upgrades, Stream/Kotlin has already been widely used in various projects. It seems that they have the prospect of replacing SQL in data computing field, restricting SQL’s role to data storage and achieving the expected framework.
Stream – pioneer of outside database data computing
When the data object is of a simple data type (integer/floating point/string/date), it is convenient for Stream to implement set-oriented operations. For example:
//Filtering:
IntStream iStream=IntStream.of(1,3,5,2,3,6);
IntStream r1=iStream.filter(m->m>2);
//Sorting:
Stream r2=iStream.boxed().sorted();
//Aggregation:
int r3=iStream.sum();
Compared with coding each operation in Java, the above pieces of code are much shorter, proofs that Stream has good enough computing capabilities. Other regular computations, such as distinct, concat and contain, can also be conveniently achieved.
Stream cannot replace SQL yet
Though Stream is convenient to use when data objects are of simple types, it is not so convenient when handling records, a more complicated type of data objects in structured computations.
Sorting, for example:
Stream<Order> result=Orders
.sorted((sAmount1,sAmount2)->Double.compare(sAmount1.Amount,sAmount2.Amount))
.sorted((sClient1,sClient2)->CharSequence.compare(sClient2.Client,sClient1.Client));
The SQL equivalent: select * from Orders order by Client desc, Amount
The SQL sorting only needs name of the sorting field, but Stream requires to write the field type. The Stream way is more complicated. SQL expresses sorting direction with keywords asc and desc, and Stream uses compare function to express the direction. The latter’s code is complicated. SQL writes the order of fields according to the desired order of the to-be-sorted fields, which is intuitive and natural, while Stream writes it in a reverse order.
Grouping & aggregation:
Calendar cal=Calendar.getInstance();
Map<Object, DoubleSummaryStatistics> c=Orders.collect(Collectors.groupingBy(
r->{
cal.setTime(r.OrderDate);
return cal.get(Calendar.YEAR)+"_"+r.SellerId;
},
Collectors.summarizingDouble(r->{
return r.Amount;
})
)
);
for(Object sellerid:c.keySet()){
DoubleSummaryStatistics r =c.get(sellerid);
String year_sellerid[]=((String)sellerid).split("_");
System.out.println("group is (year):"+year_sellerid[0]+"\t (sellerid):"+year_sellerid[1]+"\t sum is:"+r.getSum()+"\t count is:"+r.getCount());
}
The SQL equivalent: select year(OrderDate), sellerid, sum(Amount), count(1) from Orders group by year(OrderDate), sellerid
When using Lambda expressions, SQL code is short and Stream code is lengthy. When grouping data, a SQL expression is intuitive and easy to understand while Stream uses the hard-to-understand nested Lambda expression. In an unambiguous context, SQL can directly reference a field name without preceding it with the corresponding table name, while Stream must write the table name explicitly in the format of "table name.field name", which is not smart enough. SQL groups data using group by and calculate sum with sum, without any unnecessary actions, while Stream implements the grouping operation through the cooperation of groupingBy, collect and Collectors and the sum operation using both summarizingDouble and DoubleSummaryStatistics, during which each action involves more than one function and a lot of coding work.
To examine more computations like set-oriented ones and joins, we find the same phenomenon. Stream is not as concise as SQL in achieving data computations and is not suitable to implement today’s application framework.
The reason behind Stream’s inadequate computing ability is that its host language, Java, is a compiled language and cannot provide specialized structured data objects. This makes Stream lack solid low-level support. A compiled language must define the structure of a result value in advance. A multi-step computing process involves definitions of multiple data structures, which makes the code rather complicated and parameter handling inflexible. A set of complicated rules is thus used to implement the Lambda syntax. As an interpreted language, SQL naturally supports dynamic structure and can conveniently specify a parameter expression as a value parameter or a function parameter, implementing Lambda syntax in a much simpler way.
Kotlin reforms Stream to have stronger computing ability
Kotlin is a Java-based advanced programming language. The advancement is mainly manifested in the improvement of Java syntax, particularly for Stream. This gives Stream more concise Lambda expressions and more set functions, as well as the eager evaluation (which is the opposite of Stream’s lazy evaluation).
To calculate intersection of simple sets:
var numbers=listOf(3,11,21,27,9)
var others=listOf(2,11,21)
var result=numbers intersect others
As you see in examples in the previous section, Stream does not support calculating intersection while Kotlin has a rich set of set-oriented functions for performing intersection between simple sets; Stream functions are conventional prefix functions that have non-intuitive computing processes while Kotlin adds SQL-like infix functions, like Intersect, which enable simpler and more intuitive computations; Stream requires converting a regular set (List) to a lazy set (Stream) for computations, which is inconvenient to reuse, while Kotlin can directly compute an eager set and enable convenient data import/export, data reuse and type conversion.
To sort records:
var resutl=Orders.sortedBy{it.Amount}.sortedByDescending{it.Client}
In the above code, different from Stream that specifically specifies the sorting field’s data type, Kotlin sorts records directly by the specific field; unlike Stream that hardcodes the sorting direction, it expresses sorting direction directly with a function; and it directly uses “it” as the default parameter of a Lambda expression without specifically defining a parameter as Stream does. The overall Kotlin code is much shorter and has better computing performance than Stream code.
Yet Kotlin is still not the ideal replacement of SQL
Kotlin has made some improvements on Stream, but still offers complicated rules of Lambda expressions, not enough functions and functionalities, and limited increase in the overall computing capability, which is far weaker than SQL.
Still take the sorting operation as an example. Kotlin provides “it” as the default parameter (table name), but the table name is not necessary for SQL, which only needs to know the field name. The Kotlin sorting function can only sort one field at a time. To sort data by multiple fields, it needs to call the function multiple times. SQL, however, is able to receive multiple fields dynamically and just needs to call the sorting function once.
To group and summarize data:
data class Grp(var OrderYear:Int,var SellerId:Int)
data class Agg(var sumAmount: Double,var rowCount:Int)
var result=Orders.groupingBy{Grp(it.OrderDate.year+1900,it.SellerId)}
.fold(Agg(0.0,0),{
acc, elem -> Agg(acc.sumAmount + elem.Amount,acc.rowCount+1)
})
.toSortedMap(compareBy<Grp> { it. OrderYear}.thenBy { it. SellerId})
result.forEach{println("group fields:${it.key.OrderYear}\t${it.key.SellerId}\t aggregate fields:${it.value.sumAmount}\t${it.value.rowCount}") }
In the above code, a grouping & aggregation action involves multiple functions, including the complicated nested function. Here Kotlin does not make much difference. It even needs to put table name before each field name, which is as inconvenient as Stream. Like in Stream, a Kotlin grouping & aggregation result is not a structured data type and the language should define the data structure for each intermediate result beforehand. To group data by two fields, for instance, Kotlin, like Stream, still needs to define data structures for the two fields. SQL supports dynamic data structure and does not need to define a structure for an intermediate result.
By looking at more computations, such as joins and merges, we find that even though Kotlin code is shorter than the Stream equivalent, all the Stream steps appear in it. It is not nearly as concise as the SQL counterpart. The language does not have the necessary abilities to replace SQL and to implement contemporary application frameworks.
The fundamental reason for Kotlin’s insufficient computing ability is that, as Java, it is a compiled language that does not support dynamic data structure and cannot offer specialized structured data objects. Without special-purpose structured data object, it is hard to truly simplify the Lambda syntax, and it is impossible to reference a field directly (table name is a must as in the format of “UnitPrice*Quantity”) and to perform multifield-based computations (like multifield sorting) dynamically – instead you need to specifically code them on most occasions.
In order to replace SQL in Java applications to perform computations while achieving the expected architectural advantages, esProc SPL is a wise choice.
With powerful computational capability, SPL can replace SQL
esProc SPL is an open-source structured computation language under JVM ecosystem. With specialized structured data objects, a rich set of built-in functions, a large number of date and string functions, support of SQL syntax, integration-friendly JDBC driver, as well as outstanding computational capability, SPL can take the place of SQL in Java applications.
Specialized structured data types
Logics of structured computations revolve around structured data objects. Stream/Kotlin’s structured data objects are amateurs that do not support dynamic data structure. Like SQL, SPL structured data objects are specialized and support dynamic data structure.
Get values of a specific field in a record: =r.AMOUNT*0.05
Modify values of a specific field in a record: =r.AMOUNT= T.AMOUNT*1.05
Get a column from a table sequence: T.(AMOUNT)
Append a record to a table sequence: T.insert(0,31,"APPL",10,2400.4)
Get a column by name and return a simple set: T.(AMOUNT)。
Get multiple columns and return a set of sets: T.([CLIENT,AMOUNT])
Get multiple columns and return a new table sequence: T.new(CLIENT,AMOUNT)
Get data by field name and then by record number: T.(AMOUNT)(2), which is equivalent to getting data by record number and then by field name: T(2).AMOUNT
Besides, SPL table sequences has many advanced functionalities, such as getting TopN, getting values in a zigzag way, and order-based join.
SPL has rich built-in functions for basic calculations
Stream/Kotlin functions are small in number and weak in ability. They need a lot of coding work to accomplish computations. SPL has a great number of built-in functions that can achieve basic calculations using one line of code.
For instance, sorting: =Orders.sort(-Client, Amount)
Like SQL, SPL does not need to precede a field with corresponding table name, specify data type for the sorting field, and define sorting direction with an extra function. Also, it can use one function to sort multiple fields dynamically.
Grouping & aggregation: =Orders.groups(year(OrderDate),Client; sum(Amount),count(1))
Still like SQL, there are no extra functions in each piece of SPL code, and uses of sum and count functions are concise and easy to understand, with little traces of nested anonymous functions. With a double-field grouping or summarization, SPL does not need to define the data structure beforehand. The result set grouping & aggregation is still a structured data object, which can be directly computed in the next step.
As SPL uses a variable to represent the intermediate result, its stepwise computing is more convenient than SQL’s non-step-wise computing. To get two subsets and perform intersection between them, for instance:
=Orders1 ^ Orders2
The character ^ in the above code is an infix operator representing intersection. It is convenient than the infix function in both Kotlin and SQL. Similar SPL operators include & for union, \ for difference, and the special operator | for concatenation.
Same simple coding for other calculations:
Distinct: =Orders.id(Client)"
Fuzzy query: =Orders.select(Amount*Quantity>3000 && like(Client,"*S*"))
Join: =join(Orders:o,SellerId ; Employees:e,EId).groups(e.Dept; sum(o.Amount))
SPL also supports basic SQL syntax for the convenience of database programmers.
//filter
$select * from d:/sOrder.xlsx where Client like '%S%' or (Amount>1000 and Amount<=2000)
//sort
$select * from sales.xls order by Client,Amont desc
//distinct
$ select distinct(sellerid) from sales.xls
//group by…having
$select year(orderdate) y,sum(amount) s from sales.xls group by year(orderdate) having sum(amount)>=2000000
//join
$select e.name, s.orderdate, s.amount from sales.xls s left join employee.csv e on s.sellerid= e.eid
SPL supports most of the syntax in SQL-92 standard, such as set operations, case when, with and nested query. Learn more about SPL simple SQL in Open-source SPL That Can Execute SQL without RDB.
A rich set of date and string functions
SPL offers much more date functions and string functions than both Stream/Kotlin. There functionalities are even stronger than SQL and they enable much shorter code. To calculate a specified number of quarters before/after a specified date with a time function, for instance:
elapse("1980-02-27",-3) //Return1979-05-27
Find the day of the week of a specified date: day@w("2020-02-27") //Return 5, which means Thursday.
Get a date after N workdays of a specified date: workday(date("2022-01-01"),25) //Return 2022-02-04
String functions:
Check if a string all consists of numbers: isdigit("12345") //Return true
Get the string before a specified substring: substr@l("abCDcdef","cd") //Return abCD
Split a string into an array of strings by the vertical bar: "aa|bb|cc".split("|") //Return ["aa","bb","cc"]
SPL also offers functions to get a date before or after a number of years, get the ordinal number of a date in the year, get which quarter the date belongs to, split a string according to a regular expression, get the where or select part of a SQL statement, get the words from a string, split HTML by the specific marker, etc.
SPL offers JDBC driver to be seamlessly invoked by Java code
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="=T(\"D:/Orders.xls\").groups(year(OrderDate),Client; sum(Amount))";
ResultSet result = statement.executeQuery(str);
SPL’s remarkable computational capability and ease of integration make it convenient to implement today’s application frameworks.
SPL has greater computing ability than SQL
It is hard for SQL to achieve relevant computations because it lacks natural sequence numbers and does not implement set-orientation thoroughly. SPL intrinsically has sequence numbers and carries out more complete set-orientation, enabling it to have powerful order-based and set-oriented computing ability. Not equipped with the procedural control ability, it is hard for SQL to achieve complex business logics. With convenient and flexible procedural control design, SPL is able to simplify complex business logics.
Order-based computations & set-oriented operations
SQL does not have natural sequence numbers. Some databases provide the pseudo field feature, but their uses are unintuitive and roundabout. To find records whose sequence numbers are between 10 and 20, we have a subquery as follows:
select * from (select rownum no,id,name from student where rownum<=20) where no >10
SPL intrinsically has sequence numbers. It makes such a query conveniently and intuitively:
student.select(#>10 && #<=20)
Let’s look at more examples. To get multiple columns according to their numbers and return a new table sequence: T.new(#2,#4)
Get records backwards by sequence numbers: T.m(-2)
Get records by sequence numbers and return a new table sequence: T([3,4,5])
Get records within a certain range and return a new table sequence: T(to(3,5))
It is rather difficult to achieve inter-row, order-based computations. Without ready-to-use sequence numbers, SQL uses join operation or the window function to deal with them. The language manages to get simple ones done, like calculations of link relative ratio and YOY. But it becomes really awkward when computations are complicated, such as counting the longest days when a stock rises continuously, SQL generates very complicated code:
The code for doing the task is complicated:
select max(days_of_continuous)
from (select count(*) days_of_continuous
from (select sum(sign_of_updown) over(order by transDate) days_of_not_up
from (select transDate,
case when
price>LAG(price) over(order by transDate)
then 0 else 1 end sign_of_updown
from share) )
group by days_of_not_up)
SPL can reference a field through its relative position, making it simple to achieve inter-row computations and enabling it to automatically handle special cases, such as out-of-range values in an array. The language achieves the above computation with much simpler code:
A |
|
1 |
=orcl.query@x(select price from stock order by transDate) |
2 |
=t=0,A1.max(t=if(price>price[-1],t+1,0)) |
SQL’s incomplete set-orientation makes it unable to put the intermediate set-type result on hold but to calculate the final result at one go. The code is complicated and hard to understand. SPL makes thorough set-orientation, which lets it use a variable to keep an intermediate set for the next-step computation. The design helps to produce clear and simple code. For instance, to find employees who were born on the same day, we can group records and keep the subsets for further computation, as shown below:
A |
|
1 |
=demo.query(“select * from emp”).group(month(birthday),day(birthday)) |
2 |
=A1.select(~.len()>1).conj() |
Procedural control ability
Business logics often involve many procedural controls, but SQL lacks the relevant ability and needs to turn to stored procedure. The stored procedure is more tightly knitted with the database and harder to migrate. This makes a worse framework. SPL offers all-around procedural control abilities for simplifying complex business logics. Below is a branch statement:
A |
B |
|
2 |
… |
|
3 |
if T.AMOUNT>10000 |
=T.BONUS=T.AMOUNT*0.05 |
4 |
else if T.AMOUNT>=5000 && T.AMOUNT<10000 |
=T.BONUS=T.AMOUNT*0.03 |
5 |
else if T.AMOUNT>=2000 && T.AMOUNT<5000 |
=T.BONUS=T.AMOUNT*0.02 |
The SPL loop statement:
A |
B |
|
1 |
=db=connect("db") |
|
2 |
=T=db.query@x("select * from sales where SellerID=? order by OrderDate",9) |
|
3 |
for T |
=A3.BONUS=A3.BONUS+A3.AMOUNT*0.01 |
4 |
=A3.CLIENT=CONCAT(LEFT(A3.CLIENT,4), "co.,ltd.") |
|
5 |
… |
SPL can also use the break keyword to exit (terminate) the current loop body, or the next keyword to skip (ignore) the current round of loop. Here we just skip the topic.
SPL provides great ability to handle order-based and set-oriented computations, and convenient and flexible procedural control ability, enabling it to accomplish complex logics independently and letting SQL focus only on data retrieval and storage.
SPL supports optimal application framework
Besides excellent computational capability, SPL has unique advantages for system architecture, data source support, intermediate data storage and performance enhancement. This helps SPL to achieve more optimal application framework.
Interpreted execution and hot swap
As the previous example shows, simple SPL code can be embedded into Java code. If the SPL code is complicated or needs frequent modifications, it can be saved as a script file, placed outside the Java program and invoked by the file name. SPL, as an interpreted language, can execute in real-time after any changes without recompilation and restarting the Java service. For instance:
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call runSub()");
Placing SPL code outside the Java program can effectively reduce the system coupling.
SPL supports diverse data sources and cross-data-source/cross-database mixed computations
To computed data coming from a non-RDB data source or the mixed sources of RDB and non-RDB in the conventional application frameworks, we generally need to load data in the RDB source to RDB for further computation. The framework adds extra coupling between the computing module and the database and makes the code hard to migrate. SPL can directly compute data originated from any type of data source and supports the mixed computation between RDB and non-RDB, decoupling the computing module from the database, granting data sources the uniform computing ability, and enabling code to be migrated seamlessly across them.
SPL supports various types of databases and files (like txt\cs\xls), NoSQL databases, including MongoDB, Hadoop, Redis, ElasticSearch, Kafka and Cassandra, as well as multilevel data, such as WebService XML and Restful Json:
A |
|
1 |
=json(file("d:/Orders.json").read()) |
2 |
=json(A1).conj() |
3 |
=A2.select(Amount>p_start && Amount<=p_end) |
To perform a cross-data-source join between a text file and the database in SPL:
A |
|
1 |
=T("Employees.csv") |
2 |
=mysql1.cursor("select SellerId, Amount from Orders order by SellerId") |
3 |
=joinx(A2:O,SellerId; A1:E,EId) |
4 |
=A3.groups(E.Dept;sum(O.Amount)) |
SPL offers proprietary storage format to store data temporarily or permanently and to enable high-performance computations
SPL supports btx storage format for temporarily storing data coming from slow data sources, like CSV:
A |
B |
|
1 |
=[T("d:/orders1.csv"), T("d:/orders2.csv")].merge@u() |
/ Union records |
2 |
file("d:/fast.btx").export@b(A1) |
/ Write to bin file |
A btx file is small and fast to read and write. It can be computed as an ordinary text file:
=T("D:/fast.btx").sort(Client,- Amount)
Storing ordered data in a btx file can help obtain high performance for computations like parallel processing and binary search. SPL also supplies ctx storage format that brings extremely high performance. The ctx format supports data compression, column-wise/row-wise storage, distributed computing and large concurrency, fit for storing a massive amount of data permanently and achieving high-performance computations.
Both Stream and Kotlin are based on a compiled language. They offer crude structured data types and insufficient computing ability, which puts it in no position to replace SQL. SPL offers specialized structured data types, becoming qualified to displace SQL, and is easy to be integrated into Java. It has features that help achieve the contemporary application frameworks that allow business logic implemented within them and make SQL responsible only for database read and write. SPL provides powerful ability to handle order-based and set-oriented computations and control computing procedures. The language comprehensively outperforms SQL and is able to let SQL zero in on data read and write. In addition, SPL supports hot swap, diverse data source, proprietary storage format and high-performance computing, which help further optimize the application framework.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version