This is part 1 of a 3-part series on
libtorch. This post covers the rationale for PyTorch and using
libtorch in production. Part 2 covers the basics of getting your model up-and-running in
libtorch. Part 3 discusses some more advanced topics.
Since my company switched from TensorFlow to PyTorch, I’ve really come to like it. It is much easier to work with and the API is actually somewhat stable (which is actually a feature these days, looking at you TensorFlow). Sometimes it’s the little things that help make a library awesome. For example, the fact that PyTorch does not produce 2 terminals filled with
DeprecatedWarnings, arcane messages about libraries that might or not be present on your system when running the most basic MNIST example. Or the fact that the PyPi package for PyTorch comes shipped with CUDA, whereas you need to install it manually with TensorFlow. The fact that PyTorch uses eager execution by default instead of the execution graph model is a big plus as well. TensorFlow does this too as of version 2.0, but it came too late for us.
Anyway, I think TensorFlow 2.0 with an integrated high-level API (
tf.keras) and eager execution by default is a much better product then TensorFlow 1.x was. But they had to completely change the API for that to happen. I don’t trust them to not do such a thing again, so switching to PyTorch, especially from the perspective of business continuity, seems to make sense.
Putting custom deep learning models into production is quite a challenge. There are a lot of frameworks and tools for this, but most of those are just HTTP-server wrapper around your model on a distributed platform. This will work for 80% of applications, but it didn’t work for us. We ship custom servers with our software pre-installed. The output of our models is fed into external software - we push the results to those services. A REST API does not really make sense in our use case. Also, because each server is a self-contained system, we do not need any of the distributed stuff.
There are some other requirements that are important for a production system like the one described above. Our system provides real-time analytics on incoming data, so generally, the system must excel at these points:
We’ll look into each of these categories and see how we can use
libtorch, the C++ API for PyTorch to satisfy the requirements.
The prototype system we built was completely written in Python. In general, it consisted of three parts:
The data extraction part is I/O and networking bound, preprocessing is mostly CPU-bound and inference is GPU-bound by nature. Because of that, there is a lot to be gained by using a multi-threaded model for this system. That’s exactly what our prototype did. Unfortunately, Pythons threading model is famously limited by the Global Interpreter Lock, noticeably limiting throughput and latency.
The data extraction part was mostly just reading from the network and some I/O elsewhere. Most likely it would not matter if it were done in Python or some other platform. Same for the output phase where data was pushed to the external systems.
The preprocessing part of the system is comprised of a combination of
torch operations to reshape, restructure and preprocess the data to a suitable format. It was quite fast, but I was pretty sure it could be made much faster if I had control of memory allocation, copying operations and a more extensive interface to
I was not so sure if the inference would actually benefit from running in
libtorch versus Python PyTorch, but at the very least it could not get any worse.
The machine learning engine we had built around it was quite a lot of work, and as a starting company, it did not feel great to distribute all of our code inside a machine that we placed at our customers, regardless of the fact that those systems were encrypted and everything. Anyway, moving to a compiled language would improve that situation somewhat, so there’s that.
Mainly, our Python engine was limited by:
Note that every percentage increase in performance translates almost directly to cost savings in our scenario since we could fit more processing power in a single server whilst keeping our price the same. So, we had to get something faster. In PyTorch land, if you want to go faster, you go to
libtorch is a C++ API very similar to PyTorch itself. Both are built on
ATen which is C++ Tensor library, which is built on top of CUDA and
cudnn, it can also be used on the CPU.
libtorch is built to have a very similar API as PyTorch, and most things you can do in PyTorch can be done in
libtorch as well. Everything is native C++ though, so you can expect some speedups here and there. Nice!
The only drawback of
libtorch compared to PyTorch is the somewhat limited documentation. There are a lot of undocumented features in
libtorch (especially the JIT part, we’ll get to that later) and it is not as popular as PyTorch so expect your StackOverflow queries to yield less results. I was up to the challenge though! I figured out some things along the way and decided to document them here for anyone trying to do similar stuff.
Anyway, this is the rationale for why we built a Torch Script port of our PyTorch model. I don’t think you should be too eager to do this, there are a limited number of reasons to go native. In many situations, Python will just work fine, and you shouldn’t bother. If it does make sense for your use case, this series will help you get started with
libtorch and might help you learn about some pitfalls that took me many hours to figure out.
Part 2 of this series has some code and will walk you through the process of converting your PyTorch model to a