To C or not to C? Let Nim answer the question!
Python. Elegant, powerful, smart, a true super hero of programming languages. But like any superhero, it must have a weakness, it's weakness is speed. Wise wizard's came up with clever ways to overcome this, some of them going as far as making python fast by not writing python at all(c extensions). While C is fast, such power should not be treated lightly! For those not eager to deal with C hiccups I say give Nim a try!.
Table of Contents
- 1. Buyers beware!
- 2. Looking for trouble
- 3. Getting out of trouble
- 3.1. cPython
- 3.2. Nuitka
- 3.3. Pypy
- 3.4. Numba
- 3.5. Nim
- 3.6. nim-pymod
- 3.7. Procedure parameter & return types
- 3.8. Support for the following Nim types is in development:
- 3.9. PyMod -> NimTga implementation
- 3.10. Bonus Stage: Going Commando - ditching pymod and using ctypes
- 3.11. Python Code
- 3.12. Compile and run
- 3.13. How do we know the arguments type?
- 4. Did I get out of trouble?
- 5. Hello, is it Nim you're looking for?
- 6. Bonus
“Do I really look like a guy with a plan? You know what I am? I'm a dog chasing cars. I wouldn't know what to do with one if I caught it! You know, I just… do things.” ― The Joker - Heath Ledger
- I am you're average Joe programmer. No fancy big corporation, no big project, not much experience.
- Every benchmark out there lies, this included.
- I am very biased towards Nim.
“Madness, as you know, is a lot like gravity, all it takes is a little push” - The Joker
The focus of this article is not introducing Nim, nor the other options of speeding up python presented by me here. My focus is sparking interest towards Nim, giving readers reasons to include Nim in their workflow.
Looking for trouble
"Every Fairy Tale Needs A Good Old Fashioned Villain" so our "villain" must be cpu intensive, IO intensive as well while being relative easy to explain and somehow similar to a real use case scenario. Eventually I settled on this:
We have around 1000 Truevision images that we want to open, fill the first half with red pixels(simulate a watermark), and finally save it.
Since it's a batch process, the time it takes to edit each image is critical because each delay will add up to great amounts at the end. To better simulate real world scenarios, we will have to deal with a mixture of image sizes: 15, 150, 512, 1024, 2048 and 4096 pixels wide. I use small images to tax just it time compilation(jit) warm-up time and big images to either see how well a solution scales or help jit shine at it's best advantage. Before we go further let's take a quick look at what we are dealing with:
Truevision image format
Truevision TGA, often referred to as TARGA, is a raster graphics file format created by Truevision Inc. I chose it because it uses a simple compression algorithm and it's heavily used in game engines so it has some practical uses. Here is a oversimplified table representing the format. The byte layout is not important for our case, but it's here to better illustrate what we are dealing with.
|id length||RLE||Extension offset|
|image specifications||no compression||Signature|
We can either go with compressing the image or not, but for a more demanding benchmark we chose the former over the latter.
RLE compression algorithm
Wikipedia says it best:
Run-length encoding (RLE) is a very simple form of lossless data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs.
A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows:
With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows:
Getting out of trouble
“If I have to have a past, then I prefer it to be multiple choice.” ― Joker from Batman: The Killing Joke
Now that we know our "villain" how can we overcome it? Before getting our hands dirty with C or Nim, maybe we can dodge a bullet by using more "cleaner" ways of getting the required speed. Our mission is not making the fastest implementation yet, we strive for a "good enough" speed, good enough to get our job done without spending to much development power to do that, ideally none.
pyTGA is a pure Python module to manage TGA images. The library supports these kind of formats (compressed with RLE or uncompressed):
- Grayscale - 8 bit depth
- RGB - 16 bit depth
- RGB - 24 bit depth
- RGBA - 32 bit depth
The hero we need, the hero we want(well most of the time at least)!
To ruin the surprise, this will be the slowest of the bunch but it will serve as the reference point for our benchmark while also providing the blueprints for the Nim implementation. By blueprints I meas a almost 1-1 port from Python to Nim. Important to add, I am not the original author of the module, the honor goes to a guy named Mirco.
Here is a high level view of the code involved. The full code can be found here.
Nuitka is a Python compiler. It's fully compatible with Python 2.6, 2.7, 3.2, 3.3, and 3.4.
You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module.
If interpreting things is slow, why not compile it? Sounds crazy? think again! The team behind nuitka did just that and what we get is a simple, elegant way of speeding our code, and in some cases it can also help making distribution easyer because it packs everything in a nice
nuitka --recurse-all program.py and you are set.
recurse-all option will transverse the dependencies tree and compile them to, one by one.
PyPy is a fast, compliant alternative implementation of the Python language (2.7.12 and 3.3.5). It has several advantages and distinct features, speed, memory usage, compatibility, stackless
Get a huge speed improvement by just replacing
pypy program.py. To good to be true? Yes, yes it is! Two things: warmup time and incompatibility with all those good python modules written with the help of C. While the pypy team is trying to solve this thing, for now this could be a show stopper for many.
With a few annotations, array-oriented and math-heavy Python code can be made to be similar in performance to C, C++ and Fortran.
Numba works by generating optimized machine code using the LLVM compiler.
Compilation can run on either CPU or GPU hardware, integrates well with Python scientific software stack.
Not included in the benchmark because the following Python language features are not currently supported:
- Function definition
- Class definition
- Exception handling (
try .. except,
try .. finally)
- Context management (the
- Comprehensions (either
- Generator delegation (
This meant modifing the original code, thing that I dident want to do for to reasons: One: I already spent to much time on something that should have been a weekend project, second: numba implementation would have been quite different from the others and I wanted the code to look as much as posible with the rest. Numba and Cython deserve a separate blog post each, stay tuned, maybe the gods of productivity will smile upon me and make this possible.
As the new hero rises, it comes with a promise: "Performance can also be elegant!"
Nim (formerly known as "Nimrod") is a statically typed, imperative programming language that tries to give the programmer ultimate power without compromises on runtime efficiency. This means it focuses on compile-time mechanisms in all their various forms.
Before venturing further, you python hackers out there should quickly check out nim for python programmers, this will come in handy when inspecting the solution implemented in nim that you can find here. I am still learning this stuff out, so take everything with a grain of salt, a battle-scarred nim programmer will surely cringe at my source code, but what the hell: "fail fast, fail early, fail often!" as Mozilla like's to say.
For the lazy ones that didn't check the link above take this short summary:
|Execution model||Virtual Machine, JIT||Machine code via C\*|
|Meta-programming||Python (decorators/metaclasses/eval)||Nim (const/when/template/macro)|
|Memory Management||Garbage-collected||Garbage-collected and manual|
|Dependent types||-||Partial support|
|Bigints (arbitrary size)||Yes (transparently)||Yes (via nimble package)|
|Type inference||Duck typing||Yes (extensive support)|
|Operator Overloading||Yes||Yes (on any type)|
\* Other backends supported and/or planned \** Can be achieved with macros
Nim produces small executables without dependencies, so calling the code is easy:
Now that you fallen in love with Nim, should you replace all you're code base with it? Of course not! Python is great, great packages, great community, great code around it. So why not swap the performance critical part of the code with nim and keep the rest of the goodness in python. Usually this involves a great deal of boiler-plate code, but lucky we have tools to do that for us, in this case: nim-pymod. Better head over to the project github page for a great introduction, and after that glance at my humble solution. Code comments should explain the reason behind some of the code. And for the python wrapper part, check this.
What can pymod do for you?:
- Auto-generates a Python module that wraps a Nim module
- pymod consists of Nim bindings & Python scripts to automate the generation of Python C-API extensions
- There's even a PyArrayObject that provides a Nim interface to Numpy arrays.
Let's test it quickly on the console
Procedure parameter & return types
The following Nim types are currently supported by Pymod:
|Type family||Nim types||Python2 type||Python3 type|
|floating-point||float, float32, float64, cfloat, cdouble||float||float|
|signed integer||int, int16, int32, int64, cshort, cint, clong||int||int|
|unsigned integer||uint, uint8, uint16, uint32, uint64, cushort, cuint, culong, byte||int||int|
|non-unicode character||char, cchar||str||bytes|
|Numpy array||ptr PyArrayObject||numpy.ndarray||numpy.ndarray|
Support for the following Nim types is in development:
|Type family||Nim types||Python2 type||Python3 type|
|unicode code point (character)||unicode.Rune||unicode||str|
|non-unicode character sequence||seq[char]||str||bytes|
|unicode code point sequence||seq[unicode.Rune]||unicode||str|
|sequence of a single type T||seq[T]||list||list|
PyMod -> NimTga implementation
Since numpy arrays in nim are not so nicely wrapped as in python, we have to do some manual wrapping/unwrapping of arrays in array. Confused? You should! Let me elaborate: if you have an array in array like
arr = [[1st, 2nd, 3rd], [1st, 2nd, 3rd]] the
shape of arr will be
(2, 3), 2 because arr has 2 elements, and 3 because each element is an array with 3 elements. The problem now is that internally this is stored as continuous elements and not array in array, hence the wraping and unwrapping of elements.
Bonus Stage: Going Commando - ditching pymod and using ctypes
Original blog post here: http://akehrer.github.io/posts/connecting-nim-to-python/
Compile and run
How do we know the arguments type?
--header option will produce a C header file in the nimcache folder where the module is compiled.
The things to look for are
NF* x which means a nim pointer to a array of floats, and
NI that is a nim integer named
xLen0. 0 comes from nim variable name mangling.
The more you use the API and learn what arguments functions takes, the use of
--header flag will be more and more uncommon.
Did I get out of trouble?
“Enough madness? Enough? And how do you measure madness? - The Joker”
- Motherboard: Abit IP35
- CPU: Intel(R) Xeon(R) X5460 @ 3.16GHz 4 cores
- Memory: DDR2 4GiB @ 800MHz
- HDD: Seagate Baracuda x 2 RAID0
This is my personal rig, nothing fancy here. A snapshot of the bench-marking code is below, you can see the full version is here.
As expected cPython is the slowest and Nim version is the fastest. Nuitka seems a good compromise between ease of use and speed improvements and I see it as a good candidate for a lot of performance problems.
Detail view on the fastest three
Numpy is the fastest interpreter around, runs pretty fast compared to vanilla python. Another useful measurement would have been memory consumption but this is homework for anyone who is interested in this regard, just leave you're findings in the comments bellow(pretty please!).
Detail view on small execution time
Here you can clearly see the warm up time required for pypy.
Hello, is it Nim you're looking for?
“True love is finding someone whose demons play well with yours” ― The Joker Batman Arkham City
For an great elevator pitch about Nim head over to nim-lang.org(nim homepage). Another good resource for programming in general is hookrace.net and nim especially, for begining I would suggest What is special about nim?, What makes Nim practical? and lastly Conclusions on Nim!. But do not stop here, more advance topic await: Introduction to metraprogramming in Nim, NimEs: a NES emulator in Nim, Writing a 2d platform game in Nim with SDL2, and many more. If you are to lazy to check them out just remember these things:
What is Nim?
- new system programming language
- compiles to C
- garbage collection + manual memory management
- thread local garbage collection
- design goals: efficient, expressive, elegant
- as fast as C
- as expressive as Python
- as extensible as Lisp
Uses of Nim
- web development
- operating system development
- scientific computing
- command line applications
- UI applications
- And lots more!
As always if you find some mistakes or if you have any suggestions/constructive criticism please leave a comment. Since there are so few, I always read them!
Post a comment or send me an e-mail