Python Numba

Posted onby admin

NUMBANUMTHREADS must be set before Numba is imported, and ideally before Python is launched. As soon as Numba is imported the environment variable is read and that number of threads is locked in as the number of threads Numba launches. If we want to later increase the number of threads used by the process, we cannot.

  1. Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool). Numba supports compilation of Python to run on either CPU or GPU hardware, and is designed to integrate with the Python scientific software stack.
  2. Numba translates Python functions to optimized machine code at runtime using the Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. You don't need to replace the Python interpreter, run a separate compilation step, or even Just apply one of the Numba decorators to your Python function, and Numba does the rest.
  3. From what I've read, numba can significantly speed up a python program. Could my program's time efficiency be increased using numba?

Over the past years, Numba and Cython have gained a lot of attention in the data science community. They both provide a way to speed up CPU intensive tasks, but in different ways. This article describes architectural differences between them.


Numba is a just-in-time (JIT) compiler that translates Python code to native machine instructions both for CPU and GPU. The code can be compiled at import time, runtime, or ahead of time.

It's extremely easy to start using Numba, by simply putting a jit decorator:

As you may know, In Python, all code blocks are compiled down to bytecode:

Code optimization

To optimize Python code, Numba takes a bytecode from a provided function and runs a set of analyzers on it. Python bytecode contains a sequence of small and simple instructions, so it's possible to reconstruct function's logic from a bytecode without using source code from Python implementation. The process of conversion involves many stages, but as a result, Numba translates Python bytecode to LLVM intermediate representation (IR).

Note that LLVM IR is a low-level programming language, which is similar to assembler syntax and has nothing to do with Python.

Numba modes

The are two modes in Numba: nopython and object. The former doesn't use Python runtime and produces native code without Python dependencies. The native code is statically typed and runs very fast. Whereas the object mode uses Python objects and Python C API, which often does not give significant speed improvements. In both cases, Python code is compiled using LLVM.

What is LLVM?

LLVM is a compiler, that takes a special intermediate representation (IR) of the code and compiles it down to native (machine) code. The process of compiling involves a lot of additional passes in which the compiler optimizes IR. LLVM toolchain is very good at optimizing IR, so not only it compiles code for Numba, but also optimizes it.

The whole system roughly looks as follows:

Advantages of Numba:

  • Ease of use
  • Automatic parallelization
  • Support for numpy operations and objects
  • GPU support

Disadvantages of Numba:

Python Numba Tutorial

  • Many layers of abstraction make it very hard to debug and optimize
  • There is no way to interact with Python and its modules in nopython mode
  • Limited support for classes


Instead of analyzing bytecode and generating IR, Cython uses a superset of Python syntax which later translates to C code. When working with Cython, you basically writing C code with high-level Python syntax.

In Cython, you usually don't have to worry about Python wrappers and low-level API calls, because all interactions are automatically expanded to a proper C code.

Unlike Numba, all Cython code should be separated from regular Python code in special files. Cython parses and translates such files to C code and then compiles it using provided C compiler (e.g. gcc).

Python code is already valid Cython code.


Python Numba Example

However, typed version works a lot faster.

Python Numba

Writing fast Cython code requires an understanding of C and Python internals. If you know C, your Cython code can run as fast as C code.

Advantages of Cython:

  • Control over Python API usage
  • Easy interfacing with C/C++ libraries and C/C++ code
  • Parallel execution support
  • Support for Python classes, which gives object-oriented features in C

Disadvantages of Cython:

  • Learning curve
  • Requires expertise both in C and Python internals
  • Inconvenient organization of modules

Numba vs Cython

Using Numba Python

Personally, I prefer Numba for small projects and ETL experiments. You can always plug it into existing projects. If I need to start a big project or write a wrapper for a C library, I will go with Cython, because it gives you more control and easier to debug.

Also, Cython is the standard for many libraries such as pandas, scikit-learn, scipy, Spacy, gensim, and lxml.

Want a monthly digest of these blog posts?

  • Jameel Kassouri 3 years ago (from disqus) #

    Still unclear on one thing, if numba's object mode 'often does not give significant speed improvements', why have it at all?

    • Artem 3 years ago (from disqus) #

      Object mode can be useful when you have a lot of nested loops. It gives 10-50% speedup by just adding jit decorator.