Performance Optimization - Preface
The technical essence of big data is high performance. With sufficient performance, big data analysis can be truly implemented.
Performance optimization should be implemented under limited hardware conditions. Software cannot change the speed of hardware. What we can do is to design algorithms with lower complexity to reduce the actual amount of computation, and naturally we can obtain higher computing performance.
Some big data algorithms have good adaptability and can work in all cases, but they are usually more conservative and difficult to obtain high performance. In order to reduce the amount of calculation, we should carefully study and make use of the characteristics of data and tasks, and design appropriate storage schemes and calculation methods according to actual conditions.
The content of this book is to describe applicable storage schemes and optimization algorithms for different scenarios and objectives. After programmers are familiar with the principles and application prerequisites of these basic algorithms, they can flexibly combine and use them to solve high-performance problems in business. After understanding these algorithms and features, you can also make great progress in the technical selection and understanding of big data products.
The algorithms in this book are mainly oriented to structured data calculation, involving operations such as search, filtering, grouping, sorting and join. These are the basic contents of big data calculation and the most common tasks in data analysis and calculation.
This book does not just simply list and summarize the algorithms in history, many algorithms and optimization technologies are written in the book for the first time in the industry. This book not only discusses high-performance algorithms in theory, but also involves technical means that have no special advantages in complexity but can improve performance in engineering practice.
This book is not for beginners. It has certain professional requirements for readers:
1) Master various operations of relational database and SQL. The meaning of these operations will not be explained in this book.
2) Understand the knowledge equivalent to the data structure course of university computer major, and the relevant concepts will be directly cited.
3) Understand the basic knowledge of algorithm complexity analysis.
4) It is better to be familiar with C/C++ or Java programming language, memory management mechanism of operating system and basic LAN.
The principle and process of some algorithms are cumbersome and difficult. Application programmers do not have to master them. You can also use them as long as you understand the adaptive conditions of the algorithms and are familiar with the application code examples.
This book will use SPL to write application code examples, and directly use SPL data types and syntax to describe calculation objectives, which requires readers to understand in advance. Readers with SPL knowledge can easily convert these terms into the corresponding vocabulary of other programming languages.
SQL is the most commonly used structured data operation language, but it is too rough to apply most of the optimization algorithms in this book. Java, C/C++ and other programming languages still lack the necessary concepts of structured data operation, and to define them from the beginning will take too much length of the book. Moreover, although they can implement and apply these algorithms, the code will be quite long and too much energy will be consumed in details.
SPL may be the only programming language in the industry that can apply these algorithms without too cumbersome. After understanding the mechanism of these algorithms, you can also implement them by yourself in Java, C/C++ and other programming languages, and get better performance.
Table of contents