Optimization
Programs rarely reach their theoretical maximum speed. Finding such a point is a rather difficult discipline, requiring both mathematical and often detailed knowledge of the specific domain in combination with the problem at hand. What works well for one program may not be ideal for another due to different data.
During my career, I have encountered four fundamental points that commonly appear as weaknesses in both production applications of large companies and scientific team research.
Time Complexity
Imagine you want to sort 1,000 items, for example, employee names in alphabetical order. You could do it simply by iterating through all employees, finding the one that comes first, and placing them at the beginning. Then you repeat with the remaining 999, and so on. In the end, you have a sorted list. However, a closer analysis shows that this method took 1,0002 = 1,000,000 operations. This algorithm has what is called quadratic complexity. With 10,000 employees, it reaches 100 million operations. With one million, it’s a trillion, which is clearly unsustainable.
An optimal algorithm is not as obvious. It relies on randomly selecting an employee initially and sorting the remaining 999 into piles to the left or right based on whether they come before or after in the alphabet. This process repeats until you have 1,000 piles sorted in a binary tree. The resulting complexity is n * log2n. So, 10,000 operations for 1,000 employees instead of a million, and only 20 million operations for a million employees instead of a trillion.
This is just an introductory task in computer science. Today, time complexity analysis is often overlooked in practice because most data-processing libraries already contain optimal algorithms. Yet, I still encounter blocks in code, especially from less experienced programmers, where data is sorted using trivial, least optimal algorithms without third-party libraries. It works perfectly on their computer with 10 test data points. Until it’s launched in production.
Slow Processor
This problem is mainly seen in cloud environments. A program runs on a cheap, mass-produced, low-performance processor chosen to save on cloud unit costs, or it’s inadvertently selected as a default option. A better processing unit tripled the speed of a critical service in the core architecture I recently worked on.
For a CPU upgrade to be effective, it’s necessary to ensure the service in Cloud Run or Kubernetes is indeed bottlenecked by the CPU. Some services have bottlenecks in memory, disk, or network. Services where such acceleration applies, include sound and image processing, neural network preprocessing, or even running through the network itself (but here specifically, the next paragraph might help).
Slow Neural Network Performance
Even when using third-party models, ensure they run on the right hardware and have appropriate settings for quantization (network precision). In analyses, I estimate a 40-fold speed increase when running on a GPU compared to a CPU. However, at costs of $1,000 per month, it’s completely unsuitable to run small networks for distinguishing cats from dogs this way. It’s important to consider the triangle of cost, speed, and accuracy, keeping in mind that an application almost certainly won’t achieve high accuracy and speed at a very low operating cost. Thus, it's about finding the optimal point within this triangle.
Lack of Vectorization
A team of programmers gets a great model for predicting prices. They send data one product at a time, one day at a time. The evaluation takes six hours, so they come up with various automated tasks that recalculate everything overnight. An experienced ML engineer then runs the prediction on all products at once with a sliding time window and gets results in two minutes.
Honestly, this might be the most common problem I encounter. Numpy has excellent tools to reduce many problems to mere matrix multiplication, which it handles superbly. I managed to speed up a finished program by a factor of a thousand using this approach. Why not take advantage of it?
Conclusion
By addressing these points, you can significantly enhance the performance and efficiency of your programs, whether in production or research environments. If you need assistance in optimizing your code or infrastructure, I am available to help.