Order-based Aggregation by Intervals

Question

The following data table has two fields – a and b.

a

b

10

3

21

4

33

6

46

7

52

10

 

This is the result I want:

a<10

sum(b)

a<20

sum(b)

a<30

sum(b)

a<40

sum(b)

a<50

sum(b)

 

Values of a field can be divided into over 1000 intervals.

 

Answer

To aggregate data by intervals in SQL:

with

cr as

(

select “a<10” x, b b from test3 where a < 10 union all

select “a<20” x, b b from test3 where a < 20 union all

select “a<30” x, b b from test3 where a < 30 union all

select “a<40” x, b b from test3 where a < 40 union all

select “a<50” x, b b from test3 where a < 50

)

select x,sum(b) from cr group by x;

 

It’s complicated when there are many conditional intervals.

 

SPL can divide a series of numbers into multiple intervals dynamically to generate a conditional sequence, perform aggregation by each condition, and generate the resulting table sequence. No matter the number of the intervals, the SPL solution is always a piece of serval-line code:

A

1

=connect(“db”)

2

=A1.query(“select   a,b from test3 order by a”)

3

=(A2.m(-1).a – 1)\10+1

4

=A3.new(~*10:C,A2.select(a<C).sum(b):D)

 

A1: Connect to database db;

A2: Retrieve data from the data table and order data by a field;

A3: Find the last interval number. A2.m() function finds a certain record from A2’s table sequence with the parameter being the record’s sequence number; a negative number represents a reciprocal. Here we get the value of a field in the last record and find the interval where the largest value of a field falls. The last a value – 52– is contained in the interval [51,60], whose sequence number is 6;

A4: new() function generates a new table sequence by calculating field values of A3’s record sequence. Each parameter corresponds to a new field value; the colon introduces a new field name. As the result of A3 is 6, A3.new() loops over the records 6 times; the sign ~ represents the loop number. Values of C field are loop number*10; values of D field are sums of the results of query performed over A2’s table sequence according to the condition a<C.

 

A2

A3

A4