Performance Optimization - 9.7 [Cluster] Multi-job load balancing

 

Performance Optimization - 9.6 [Cluster] Spare-wheel-pattern fault tolerance

Similar to the multi-thread parallel computing on a single machine, the multi-machine parallel computing framework described in the first section will also wait for the slowest node to return the result before proceeding. Although we can try to make the amount of data processed by each node more balanced, we cannot guarantee the execution speed of each node is the same, so it is still impossible to avoid the phenomenon that the fast node waits for the slow node. In most cases, such wait is not serious, but it is intolerable in pursuit of ultimate performance.

In theory, just like the way for handling multi-thread, we can split the task into smaller pieces and dynamically assign them to balance the load on each node and reduce the waiting time. However, the multi-machine situation is more complicated, and only with data redundancy can the dynamic load balance be achieved. If a piece of data is only stored on one node, then only this node can calculate the data, and other nodes have to wait even if they are faster, otherwise, the data needs to be transmitted over the network, which not only causes network latency but also consumes the computing resources of the node where the data is stored. As a result, the loss outweighs the gain. Since multiple threads on the same machine share the stored data, this problem does not exist.

A
1 [“192.168.0.101:8281”,“192.168.0.102:8281”,…, “192.168.0.104:8281”]
2 =callx(“sub.dfx”,to(1000);A1)

The callx()function can implement this mechanism. The sub.dfx in the parameters is the script on the node, which is used to calculate the job, with the job number as the parameter. A2 will generate 1000 jobs. First, a part of the jobs will be assigned to the nodes to make the number of jobs on each node reach its limit, and then the node that finishes a job will be assigned another job, hereby achieving dynamic load balance. When a calculation error occurs because the data for a certain job is not on the assigned node, the callx() function will reassign this job to another node. If no node is available to execute the job, the calculation fails.

At the end of calculation, A2 will get a sequence composed of the return values of every sub.dfx, which is ordered by job number.

There is another problem when the number of jobs is large. Since the results will be returned to the master computer after each job is finished, the number of returns will be very large, which will bring a heavy network burden. Fortunately, many calculation results can be summarized first by the nodes and then returned, so the number of returns will be reduced to the number of nodes.

The callx() function has another parameter that specifies the operation script for first aggregating.

A
1 [“192.168.0.101:8281”,“192.168.0.102:8281”,…, “192.168.0.104:8281”]
2 =callx(“sub.dfx”,to(1000);A1;“reduce.dfx”)
3 =A2.sum()

Assuming that the calculation task is to sum the return values of each sub.dfx, the code for reduce.dfx is:

A
1 return p1+p2

where, p1 and p2 are two parameters of reduce.dfx; p2 is the return value of each job, and p1 is the current aggregation value of the node. Each time the node finishes a job, it will call the reduce.dfx to get a new aggregation value to replace the current one. After the node finishes all jobs, it will send the last aggregation value as its return value to the master computer. At this time, A2 will get a sequence composed of return values of each node in the same order as the nodes in its parameter (i.e., A1). The p1 and p2 here are a bit like ~~ and ~ in the iteration function.

In distributed computing terminology, such aggregation action performed by node is called reduce.


Performance Optimization - Preface