What are the difficulties in self-service associated queries?
Things are universally interconnected, and many queries with business significance also involve the association of multiple data tables.
BI software usually provides self-service query function, and some software can also support associated queries, but most of them are actually single table, and the associated query function is rarely used by business personnel. Queries involving associated tables often require technical personnel to prepare in advance, which is commonly referred to as wide table. Business personnel usually only perform queries based on a single wide table. Associated queries are the weak point of almost all BI software, whether it’s a big name or a newcomer, almost every time they try, they will be all defeated.
Why is this happening?
Because many people do not know how to use the multi table associated query function provided by these softwares.
To understand this issue, we need to start with how databases perceive associations.
Data is often stored in relational databases (or similar logical structures), typically using SQL for querying. The definition of JOIN in SQL is very simple. When two tables are associated, simply provide the corresponding associated fields, without any further information or conventions.
This simple rule may make many people dizzy when there are many associated tables.
If there is only a unique association between the two tables involved in the association, then as long as the corresponding association fields are set, most people will not be confused. These situations, we refer to as simple associations.
But sometimes there may be situations where the association is not simple, especially when there are many associated tables, users may faint.
A common situation is that there are multiple association relationships between two tables, which we refer to as repeated associations.
For example, in the call record table, there will be the calling and called numbers, and we need to associate these numbers with the phone account table to obtain the relevant information of the numbers (such as registration location) as query criteria. For example, if we want to query the call records of all NY to LA numbers, we need to associate the call record table with the phone account table. The phone account table needs to be associated twice, using the calling and called numbers in the call record table respectively, in order to retrieve the registration places of the calling and called numbers separately. Firstly, it is necessary to associate the same phone account table twice, which a considerable portion of the software does not support at all; Secondly, it is necessary to extract the registration location field twice separately, and distinguish whether it is associated with the calling number or the called number. This requires assigning different aliases to the phone account table to distinguish (SQL does this), which can make non professionals feel confused.
Repeated associations may also result in being followed and copied. Our example has actually been simplified. Usually, the phone account table does not directly store locations at the NY or LA level. Instead, a location number is associated with the region table, and the city region corresponding to the location number can be retrieved from the region table(or even several levels). Then in the above association process, this region table will also be associated twice and needs to be aliased to distinguish. If the region is further graded (which is actually a common occurrence), there will be more tables associated with it twice. When there are slightly more associated tables, even technical personnel need to be careful to understand them, and business personnel will basically be unable to understand them.
Using this region table, we can easily give another example of a frustrating query: a call made to a LA number after roaming from NY to DC, which can be easily done in the database. The call record table contains base station information, and the base station table can be associated with the region table to obtain the location of the call. However, such complex association relationships are completely incomprehensible to business users on the BI interface.
A more troublesome situation is the mutual association.
Employee table and department table are in the personnel system. The department field in the employee table is associated with the department table. The department will have a manager, and the manager is also an employee. The manager field in the department table will be further associated with the employee table. This leads to a situation of mutual association, causing a circle.
Now we need to identify the male employees under the female manager, just think about what SQL would look like. Associate the employee table with the department table to obtain the department manager, and then switch back to associate with the employee table to obtain the manager’s gender. The employee table appears twice and needs to be aliased in order to distinguish whether the gender field extracted from the employee table belongs to the employee under investigation or his/her manager.
With this simple example, many BI software that claims to be invincible can be revealed in its original form. A considerable portion of those that do not support repeated associations simply fall dead, and even some that can barely manage it will be very convoluted. Business personnel cannot understand it at all, and it is still equivalent to not being able to handle it.
The extreme situation of mutual association can also become self-association.
For example, the region mentioned earlier may be graded, and a graded region table may not be made into multiple tables, but only one table with a field representing its superior region (number). This is a common data structure design, but it also means that the region table will be associated with itself. From the lowest level business office (or base station) to the state-level region, there may be as many as three or five levels, and this table will have to be repeatedly associated three or five times, with three or five aliases to distinguish. Do you think business personnel will be dizzy?
The reason for these phenomena is that, as mentioned earlier, SQL’s definition of JOIN is too simplistic, and overly simplistic definitions cannot capture the fundamental characteristics of associations. The simple definition makes it easy to understand, but describing reality can be very complex. Just like only learning addition is easy, but trying to describe multiplication as addition can be cumbersome. We are only giving examples of foreign key tables here. If we mix the homo-dimensional table and primary sub table (although these two situations are not more complex in themselves), things will become extremely complicated (writing SQL will bring sub queries, while previously there were just too many JOINs). We cannot accuse business personnel of being stupid. In this association model, when associating with seven or eight tables (which is common for large businesses), technical personnel often make mistakes, let alone business personnel who are not familiar with the database. As a result, these softwares provided association query capabilities but were not effective, so it is still necessary to trouble technical personnel to create a wide table. But in this way, you have to seek help from the IT department for everything, making it difficult to do online analysis.
Then what should we do?
Well, just pay attention to DQL in esProc SPL to find the answer.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version