Patience is a virtue. We’re all used to waiting in line, being patient for our turn to come. The alternative is a lot of squabbling about queue jumpers etc. and would likely result in everyone waiting even longer, whether it’s queuing for check in, waiting to be served in a restaurant or so on. Is patience always a virtue though in other areas, in particular in work?
When it comes to coding patience is sometimes a virtue, but more often than not, impatience is more of a virtue to make code faster! Over the past few months since Alexander Denev and I cofounded Turnleaf Analytics, we have been building inflation forecasting models (and indeed if you’re interested in our next funding round, let us know)! The research process in has involved many steps, including collecting data, cleaning it and a large amount of prototyping different models in Python. We’ve been using machine learning as part of the exercise. Training machine learning models can be computationally intensive and hence can take time. After some work, we’ve managed to speed up the code significantly, which has been incredibly helpful for modelling and reducing computation costs too.
Obviously rewriting everything in C++ could speed it up, but that kind of the defeats the point of using Python in the first place! Whilst it is still a bit of an art to write fast code, there are many types of tips/tricks you can try to help you speed up your code more broadly. Whilst I’ve written articles like this before, I find that every time I do a new project I find new tricks for speeding up Python! Some of these tricks I teach in a Python workshop which includes a module on speeding up Python. Many of the ideas are also likely to be applicable to other programming languages too. Obviously, you also need to weigh the time it takes to speed up your code with the benefits. If you’re only going to run a script, you probably won’t feel the necessity to speed it up. However, if you are regularly running and the process takes hours, you might feel it’s worth it.
The first thing I would suggest is to run a code profiler to identify where there are particular bottlenecks in your code. Otherwise, you’ll find that you might spend time “optimizing” code, which doesn’t really need to be optimized. The end result can just make your code more difficult to read. More broadly you need to weigh whether optimizing your code to the n-th degree, will make it more challenging to maintain your code, so a balance needs to be struck.
I usually use the code profiler in PyCharm which gives you a break down of the time it takes for a function to run, the number of times it is called etc (although this isn’t available in the community version). There’s also a nice new open sourced tool from Bloomberg called memray to track the call stack, memory usage and memory leaks (although it doesn’t give you a breakdown of execution times from what I can see). You can also do code profiling in Jupyter notebooks. Jupyter notebook.
I find Jupyter notebooks great for teaching and also when prototyping models. However, when it comes to moving to production code, it’s a lot easier to refactor the code into Python scripts. There are all sorts of reasons for using scripts for production instead of notebooks, including the fact that scripts are easier to version code, the execution order of cells can be different orders, code reuse is easier etc. You can also use Jupyter notebooks as a front end, with relatively small code snippets, but calling more substantial code in scripts, to get the best of both worlds.
Once you’ve found your performance bottlenecks you can think about ways of speeding them up. If you’re using Pandas extensively, can you try to rewrite the code using NumPy, and only converting to Pandas DataFrames later on? Is another library Vaex more appropriate to use instead of Pandas if you have a massive dataset (see this example notebook)? Can you use tools like Numba to speed up your NumPy calculations? I’ve found Numba very promising for many tasks (note, it doesn’t work with Pandas, just NumPy), and you can write “fast” for loops with it. More broadly, for loops, can you factor out code running in an inner loop so it’s run in outer loops? Can you cache certain calculations to speed up code? This notebook discusses some of these steps to speed up CDF calculations.
The next step is to see whether you make the code run in parallel. Some algorithms are easier to parallelise than others, one such example is Monte Carlo. If you run on the cloud, you can spin up servers with many cores purely to run the parallel operation of interest and then power down. It might also be possible to speed up some computations by using GPUs. Indeed, many machine learning libraries like TensorFlow and PyTorch work out of the box with GPUs if you configure them, without having to change the rest of your code. There are also a number of Nvidia libraries for data science (RAPIDS) that make it easier to code GPUs from Python, without needing to do low level CUDA. Numba can also be used to target GPUs, albeit with some additional rewriting of code.
Obviously, it’s not always easy to speed up code, and as mentioned it is still somewhat of an art. Furthermore, in some cases, it might be very time consuming to rewrite code to speed it. However, if it’s code you use often, then it might well be worth the effort and see if some of the above ideas work.