SPL Programming - 5.6 [Sequence as a whole] Sorting related
In order to solve the problem of digital black hole, we have written several sort codes. Sorting is really a very common action, so SPL provides sorting function directly.
A.sort() will return a sequence that arranges the members of A from small to large. Of course, we often use sorting from large to small, just use the @z option.
The sorting function returns a sequence composed of the original members of the sequence, which can be understood as a type of selection function. It also has a corresponding positioning function, A.psort()will return a number sequence p, so that A(p) is arranged from small to large. In other words,
A.sort()==A(A.psort())
Please try to understand the meaning of this equation.
Try it. As a rule, the result of column A is the constant of column B.
A | B | |
---|---|---|
1 | [13,30,45,23,42,98,61] | |
2 | =A1.sort() | [13,23,30,42,45,61,98] |
3 | =A1.sort@z() | [98,61,45,42,30,23,13] |
4 | =A1.psort() | [1,4,2,5,3,7,6] |
5 | =A1(A4) | [13,23,30,42,45,61,98] |
With the sorting function, the digital black hole can be simpler, without using pseg()and insert():
A | B | C | D | |
---|---|---|---|---|
1 | 5 | 12345 | =A1.run(~=if(#>1,~[-1]*10,1)) | |
2 | for | =C1.(B1\~%10).sort() | ||
3 | =C1.sum(B2(#)*~) | =C1.sum(B2.m(-#)*~) | ||
4 | =@|B1 | >output(B1) | >B1=B3-D3 | |
5 | if B4.pos(B1) | break |
Sorting function also supports the case of expression A.sort(x), but it is not equal to A.(x).sort(). It is defined by psort(), that is, A.sort(x)==A(A.psort(x)), and A.psort(x) can be defined by A.(x).psort().
This is the same as the case of maxp(), which is the common feature of selection function and positioning function. The same is true for select() function, that is, A.select(x)==A(A.pselect@a(x)), not A.(x).select().
However, SPL does not provide a function that is consistent with A.max(x) style and can directly return A.(x).sort(), so this requirement can only be calculated with this expression. This is caused by historical habits. In the early programming languages, sort() function was designed in this way.
Sorting is generally used to get a neat order, but it can also be used to disrupt the order of a sequence.
A | B | |
---|---|---|
1 | =to(1000) | =A1.sort(rand()) |
The rand() function in B1 will return a random floating-point number between 0 and 1 in each calculation. The random number is disordered, and the order of sorting is disordered. As a result, A1, which was originally very neat, is arranged in a disordered way. If we develop a poker game, this principle can be used to shuffle cards.
The method of disrupting the order can also be used for sampling. For example, 100 numbers should be randomly selected from the range of 1 to 1000, which cannot be repeated. It’s not easy to generate them randomly, and it’s troublesome to regenerate them if there is repetition. It is very simple using the above sorting method, just getting the top 100 after disrupting the order.
A | B | |
---|---|---|
1 | =to(1000) | =A1.sort(rand()).to(100) |
We’ll explain a little more about the rand() function. It is very convenient to use random numbers to generate test data. However, when we find an error in the test, we hope to run the same set of test data again to find out where the error is. But, random numbers are random, and another batch is generated after the second calculation.
In fact, there are no real random numbers in the computer, but a series of calculations are carried out to get new numbers from the numbers generated last time. When the calculation rules meet some mathematical principles, it is difficult to see the pattern, and it can also ensure that the calculated numbers can be evenly distributed within the specified range. These random numbers are called pseudo-random numbers. As long as the first number is determined, the sequence of the numbers calculated is determined.
Using this principle, we can create the same “random numbers”.
A | B | |
---|---|---|
1 | =rand@s(1) | =100.(rand()) |
2 | =100.(rand()) | |
3 | =rand@s(1) | =100.(rand()) |
rand@s(s) function sets the parameter s to the first random number, called the seed. Running this code shows that B1 and B3 are the same.
So, if the seeds used at the beginning of each program execution are same, then the generated random number sequences will all be the same?
No, when the program starts, it will automatically execute a statement like rand@s(long(now())), now() returns the current time, that is, the start time of program execution is used as the seed. This is operated by human and has enough randomness, which can ensure that different random number sequences will be obtained before each program execution.
Looking back, psort()is not only used to define sort(), but also useful. Look at this code:
A | B | |
---|---|---|
1 | =50.(1000+~) | =A1.(rand(100)) |
2 | =B1.psort@z() | =A1(A2) |
The rand(n) in B1 means to return a random integer below n.
Regarding A1 as the student numbers of a group of students (Actually we should use names, but wait until we learn about strings. We just use integers as student numbers for now). B1 can be understood as the scores of these students, which correspond to the members of A1 one by one. Now we want to get the result sequence of these student numbers sorted by their corresponding scores.
It’s obviously useless to sort A1, while sorting B1 can sort out the score itself, but it can’t get the sorting of student numbers. Using psort() can solve this problem. Now B2 is the student number sequence that we want to sort by score.
When the number of members is large, the sorting of the whole sequence is slow and often meaningless. For example, with the sales of tens of thousands of stores, we usually only care about the top few and the corresponding sales amounts, and we are not interested in the situation after hundreds or thousands. And to get the top few, there is no need to sort the whole sequence.
SPL provides the top()function to get only the first few. Similarly, there is the ptop() function, but the syntax rules are a little different from the sort()function. Because the top() function has three cases similar to the max()function: max() that returns the value, positioning function pmax()and selection function maxp().
A | B | |
---|---|---|
1 | [13,30,45,24,42,98,61] | |
2 | =A1.ptop(3) | [1,4,2] |
3 | =A1.top(3) | [13,24,30] |
4 | =A1.ptop(3,~%10) | [2,7,5] |
5 | =A1.top(3,~%10) | [0,1,2] |
6 | =A1.top(3;~%10) | [30,61,42] |
The positioning function A.ptop(n) returns the positions of the smallest n members. When there is a parameter, A.ptop(n,x) is defined as A.(x).ptop(n). It is similar to pmax(), which is not difficult to understand. A.top(n) can be defined as A(A.ptop(n)). When there is no x parameter, it is both a value and a member. It can be analogized to max or maxp()without ambiguity. However, with the x parameter, A.top(n,x) is defined as A.(x).top(n), which is equivalent to max(), is for the value. The selection function is represented by A.top(n;x), and it corresponds to maxp(). Note that a semicolon is used to separate parameters instead of a comma.
Maybe because the word topp really does not look good, it is not used. Here n can’t be omitted. In SPL, there are a lot of cases where semicolons are used to distinguish parameters (we will talk about it later), so that we can distinguish them.
Generally speaking, the function style designed in SPL is relatively consistent, and it is easy to make an analogy between different operations.
SPL Programming - Preface
SPL Programming - 5.5 [Sequence as a whole] Positioning and selection
SPL Programming - 5.7 [Sequence as a whole] Lambda syntax*
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL