# To C or not to C? Let Nim answer the question!

Python. Elegant, powerful, smart, a true super hero of programming languages. But like any superhero, it must have a weakness, it's weakness is speed. Wise wizard's came up with clever ways to overcome this, some of them going as far as making python fast by not writing python at all(c extensions). While C is fast, such power should not be treated lightly! For those not eager to deal with C hiccups I say give Nim a try!.

“Do I really look like a guy with a plan? You know what I am? I'm a dog chasing cars. I wouldn't know what to do with one if I caught it! You know, I just… do things.” ― The Joker - Heath Ledger

• I am you're average Joe programmer. No fancy big corporation, no big project, not much experience.
• Every benchmark out there lies, this included.
• I am very biased towards Nim.

“Madness, as you know, is a lot like gravity, all it takes is a little push” - The Joker

The focus of this article is not introducing Nim, nor the other options of speeding up python presented by me here. My focus is sparking interest towards Nim, giving readers reasons to include Nim in their workflow.

# Looking for trouble

"Every Fairy Tale Needs A Good Old Fashioned Villain" so our "villain" must be cpu intensive, IO intensive as well while being relative easy to explain and somehow similar to a real use case scenario. Eventually I settled on this:

We have around 1000 Truevision images that we want to open, fill the first half with red pixels(simulate a watermark), and finally save it.

Since it's a batch process, the time it takes to edit each image is critical because each delay will add up to great amounts at the end. To better simulate real world scenarios, we will have to deal with a mixture of image sizes: 15, 150, 512, 1024, 2048 and 4096 pixels wide. I use small images to tax just it time compilation(jit) warm-up time and big images to either see how well a solution scales or help jit shine at it's best advantage. Before we go further let's take a quick look at what we are dealing with:

## Truevision image format

Truevision TGA, often referred to as TARGA, is a raster graphics file format created by Truevision Inc. I chose it because it uses a simple compression algorithm and it's heavily used in game engines so it has some practical uses. Here is a oversimplified table representing the format. The byte layout is not important for our case, but it's here to better illustrate what we are dealing with.

id length RLE Extension offset
... or ...
image specifications no compression Signature

We can either go with compressing the image or not, but for a more demanding benchmark we chose the former over the latter.

## RLE compression algorithm

Wikipedia says it best:

Run-length encoding (RLE) is a very simple form of lossless data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs.

A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows: WWWWBBBWWBBBBBBW With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows: 4W3B2W6B1W

# Getting out of trouble

“If I have to have a past, then I prefer it to be multiple choice.” ― Joker from Batman: The Killing Joke

Now that we know our "villain" how can we overcome it? Before getting our hands dirty with C or Nim, maybe we can dodge a bullet by using more "cleaner" ways of getting the required speed. Our mission is not making the fastest implementation yet, we strive for a "good enough" speed, good enough to get our job done without spending to much development power to do that, ideally none.

pyTGA is a pure Python module to manage TGA images. The library supports these kind of formats (compressed with RLE or uncompressed):

• Grayscale - 8 bit depth
• RGB - 16 bit depth
• RGB - 24 bit depth
• RGBA - 32 bit depth

## cPython

The hero we need, the hero we want(well most of the time at least)!

To ruin the surprise, this will be the slowest of the bunch but it will serve as the reference point for our benchmark while also providing the blueprints for the Nim implementation. By blueprints I meas a almost 1-1 port from Python to Nim. Important to add, I am not the original author of the module, the honor goes to a guy named Mirco.

Here is a high level view of the code involved. The full code can be found here.

## Nuitka

Nuitka is a Python compiler. It's fully compatible with Python 2.6, 2.7, 3.2, 3.3, and 3.4.

You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module.

If interpreting things is slow, why not compile it? Sounds crazy? think again! The team behind nuitka did just that and what we get is a simple, elegant way of speeding our code, and in some cases it can also help making distribution easyer because it packs everything in a nice *.exe.

nuitka --recurse-all program.py and you are set. recurse-all option will transverse the dependencies tree and compile them to, one by one.

## Pypy

PyPy is a fast, compliant alternative implementation of the Python language (2.7.12 and 3.3.5). It has several advantages and distinct features, speed, memory usage, compatibility, stackless

Get a huge speed improvement by just replacing python with pypy eg: pypy program.py. To good to be true? Yes, yes it is! Two things: warmup time and incompatibility with all those good python modules written with the help of C. While the pypy team is trying to solve this thing, for now this could be a show stopper for many.

## Numba

With a few annotations, array-oriented and math-heavy Python code can be made to be similar in performance to C, C++ and Fortran.

Numba works by generating optimized machine code using the LLVM compiler.

Compilation can run on either CPU or GPU hardware, integrates well with Python scientific software stack.

Not included in the benchmark because the following Python language features are not currently supported:

• Function definition
• Class definition
• Exception handling (try .. except, try .. finally)
• Context management (the with statement)
• Comprehensions (either list, dict, set or generator comprehensions)
• Generator delegation (yield from)

This meant modifing the original code, thing that I dident want to do for to reasons: One: I already spent to much time on something that should have been a weekend project, second: numba implementation would have been quite different from the others and I wanted the code to look as much as posible with the rest. Numba and Cython deserve a separate blog post each, stay tuned, maybe the gods of productivity will smile upon me and make this possible.

## Nim

As the new hero rises, it comes with a promise: "Performance can also be elegant!"

Nim (formerly known as "Nimrod") is a statically typed, imperative programming language that tries to give the programmer ultimate power without compromises on runtime efficiency. This means it focuses on compile-time mechanisms in all their various forms.

Before venturing further, you python hackers out there should quickly check out nim for python programmers, this will come in handy when inspecting the solution implemented in nim that you can find here. I am still learning this stuff out, so take everything with a grain of salt, a battle-scarred nim programmer will surely cringe at my source code, but what the hell: "fail fast, fail early, fail often!" as Mozilla like's to say.

For the lazy ones that didn't check the link above take this short summary:

Feature Python Nim
Execution model Virtual Machine, JIT Machine code via C\*
Meta-programming Python (decorators/metaclasses/eval) Nim (const/when/template/macro)
Memory Management Garbage-collected Garbage-collected and manual
Types Dynamic Static
Dependent types - Partial support
Generics Duck typing Yes
int8/16/32/64 types No Yes
Bigints (arbitrary size) Yes (transparently) Yes (via nimble package)
Arrays Yes Yes
Bounds-checking Yes Yes
Type inference Duck typing Yes (extensive support)
Closures Yes Yes
Custom Operators No Yes
Object-Oriented Yes Minimalistic\*\*
Methods Yes Yes
Multi-Methods No Yes
Exceptions Yes Yes

\* Other backends supported and/or planned \** Can be achieved with macros

Nim produces small executables without dependencies, so calling the code is easy: program_bin args

## nim-pymod

Now that you fallen in love with Nim, should you replace all you're code base with it? Of course not! Python is great, great packages, great community, great code around it. So why not swap the performance critical part of the code with nim and keep the rest of the goodness in python. Usually this involves a great deal of boiler-plate code, but lucky we have tools to do that for us, in this case: nim-pymod. Better head over to the project github page for a great introduction, and after that glance at my humble solution. Code comments should explain the reason behind some of the code. And for the python wrapper part, check this.

What can pymod do for you?:

• Auto-generates a Python module that wraps a Nim module
• pymod consists of Nim bindings & Python scripts to automate the generation of Python C-API extensions
• There's even a PyArrayObject that provides a Nim interface to Numpy arrays.

Let's test it quickly on the console

## Procedure parameter & return types

The following Nim types are currently supported by Pymod:

Type family Nim types Python2 type Python3 type
floating-point float, float32, float64, cfloat, cdouble float float
signed integer int, int16, int32, int64, cshort, cint, clong int int
unsigned integer uint, uint8, uint16, uint32, uint64, cushort, cuint, culong, byte int int
non-unicode character char, cchar str bytes
string string str str
Numpy array ptr PyArrayObject numpy.ndarray numpy.ndarray

## Support for the following Nim types is in development:

Type family Nim types Python2 type Python3 type
signed integer int8 int int
boolean bool bool bool
unicode code point (character) unicode.Rune unicode str
non-unicode character sequence seq[char] str bytes
unicode code point sequence seq[unicode.Rune] unicode str
sequence of a single type T seq[T] list list

## PyMod -> NimTga implementation

Since numpy arrays in nim are not so nicely wrapped as in python, we have to do some manual wrapping/unwrapping of arrays in array. Confused? You should! Let me elaborate: if you have an array in array like arr = [[1st, 2nd, 3rd], [1st, 2nd, 3rd]] the shape of arr will be (2, 3), 2 because arr has 2 elements, and 3 because each element is an array with 3 elements. The problem now is that internally this is stored as continuous elements and not array in array, hence the wraping and unwrapping of elements.

## Bonus Stage: Going Commando - ditching pymod and using ctypes

Original blog post here: http://akehrer.github.io/posts/connecting-nim-to-python/

## How do we know the arguments type?

The --header option will produce a C header file in the nimcache folder where the module is compiled.

The things to look for are NF* x which means a nim pointer to a array of floats, and NI that is a nim integer named xLen0. 0 comes from nim variable name mangling. The more you use the API and learn what arguments functions takes, the use of --header flag will be more and more uncommon.

# Did I get out of trouble?

“Enough madness? Enough? And how do you measure madness? - The Joker”

## PC specs

• Motherboard: Abit IP35
• CPU: Intel(R) Xeon(R) X5460 @ 3.16GHz 4 cores
• Memory: DDR2 4GiB @ 800MHz
• HDD: Seagate Baracuda x 2 RAID0

This is my personal rig, nothing fancy here. A snapshot of the bench-marking code is below, you can see the full version is here.

## Results

As expected cPython is the slowest and Nim version is the fastest. Nuitka seems a good compromise between ease of use and speed improvements and I see it as a good candidate for a lot of performance problems.

## Detail view on the fastest three

Numpy is the fastest interpreter around, runs pretty fast compared to vanilla python. Another useful measurement would have been memory consumption but this is homework for anyone who is interested in this regard, just leave you're findings in the comments bellow(pretty please!).

## Detail view on small execution time

Here you can clearly see the warm up time required for pypy.

# Hello, is it Nim you're looking for?

“True love is finding someone whose demons play well with yours” ― The Joker Batman Arkham City

For an great elevator pitch about Nim head over to nim-lang.org(nim homepage). Another good resource for programming in general is hookrace.net and nim especially, for begining I would suggest What is special about nim?, What makes Nim practical? and lastly Conclusions on Nim!. But do not stop here, more advance topic await: Introduction to metraprogramming in Nim, NimEs: a NES emulator in Nim, Writing a 2d platform game in Nim with SDL2, and many more. If you are to lazy to check them out just remember these things:

## What is Nim?

• new system programming language
• compiles to C
• garbage collection + manual memory management
• design goals: efficient, expressive, elegant

## Goals

• as fast as C
• as expressive as Python
• as extensible as Lisp

## Uses of Nim

• web development
• games
• compilers
• operating system development
• scientific computing
• scripting
• command line applications
• UI applications
• And lots more!

# Bonus

I also did a presentation at my local python meetup group RoPython. Here is a video(in romanian but slides are in english), and the corresponding slides.