The Awkward Case of the Julia Language

05 Feb 2020

No targeted applications
No guarantee on performance lower bound
Non-informative Documentation
No OOP model
1 Indexing
Some random thoughts…

Since last year, I happily started to learn Julia, hoping that this promising language will drastically increase my work efficiency and change the way that I work. Unfortunately, I find out that the reality is hardly similar to what I expected. So far, my conclusion is that there is barely a situation where using Julia can consistently boost work efficiency.

No targeted applications

As someone whose job is to process data and generate insights using machine learning techniques, I am really comfortable doing almost everything with existing Python toolchain. That is to say, 99% of the time, using only Python should be enough. Of course, one of the aspects that Python gets criticized the most is that the language is incapable of dealing with the remaining 1%, indicating that one has to turn to other solutions. However, Python’s affinity with C/C++ makes it really easy to extend the language, especially when we only want to replace a small part of our code with pure C/C++ or Cython (which is even more efficient, development-wise). The first problem of Julia is that as a standalone language, it is unable to integrate into other toolchains rapidly. If we want to use Julia, then we should probably write the entire program in Julia. Judging the ecosystem of Julia right now, this is certainly very difficult. What’s more, we only want to use this language in 1% of the time. Is it really worth all the effort? At least for me, it is really difficult to determine “when to use Julia”, because I cannot think of a scenario where the language can drastically shrink the overall time span of my projects.

No guarantee on performance lower bound

One of Julia’s shiniest features is its speed: it is supposed to run fast. However, what people usually don’t tell you is how slow the language’s compiler is. The JIT compiler of Julia happens to be extremely slow. Data scientists are required to plot data all the time, where they would make small modifications to several lines of code and rerun it. Unfortunately, the Julia compiler can incur delays up to seconds for these plotting commands, which usually completes within milliseconds in Python. On the other hand, the speed of a Julia program is still generally slower than its C/C++ counterparts. So far, Julia’s support for concurrency is still rather primitive. There is no way that we can write a high performance program if the language is unable to utilize all CPU’s cores effectively (then it is just another Python!). As a result, Julia is a language that is strictly slower than C/C++, but it can also be thousands of times slower than Python in some everyday scenarios. This is very undesirable.

Maybe I am still not very proficient in Julia, but for me writing a scientific computing program in C++ is no slower than doing it in Julia. If I am developing on Linux, I should be able to utilize all sorts of C++ libraries very quickly, and C++ comes with powerful IDEs and debuggers to facilitate the development. Using Julia, I may need much more time to locate the problem if my code is not working correctly. Also, one has to be very careful about object creation in Julia, as it introduces tremendous overhead. This can happen implicitly, which drastically slow your program down. It can take hours until you can figure out where the bottleneck is.

Non-informative Documentation

The Julia community is still quite small. As a result, its documentation is too concise. There are only simple introductions over APIs, but it doesn’t necessarily tell you how to deal with a problem. You have to figure out everything on your own. For example, I once wanted to know how to get the pointer to a Julia array. I cannot believe that it took me 30 minutes to figure out such a trivial problem. There are a lot of complicated macros and metaprogramming techniques that are described with several sentences in the documentation. As a result, I have difficulty reading others’ code. All these just reflect that the content of Julia’s documentation needs to be enriched.

No OOP model

Although there has been a lot of debate on whether OOP is good or not, I think the paradigm can make function declarations more concise, and the program design more elegant. Also, OOP makes it much easier to manage resources that is controlled by an object, because we can relinquish these resources in the object’s destructor. The lack of OOP model makes it difficult to manage resources in Julia, especially when working with C modules.

1 Indexing

1 indexing, inherited from Matlab, which is a commercial script language that looks absolutely horrendous, may be one of the worst design choices of Julia. Because almost all languages use 0 indexing, switching the mindset just messes up my head and makes me much more error prone. It is really awkward when you are computing the end index of arrays. Instead of start + length, you have to use start + length - 1. The most ridiculous situation comes when you try to interact with C pointers with unsafe_read, etc. Even though these functions are meant to deal with C pointers, they are all using 1 indexing! Eventually, I can’t describe how absurd it is, because at some point Julia will definitely subtract our index by one so that the memory address is correct. That just sounds like redundant work. This may also indicate that Julia will only work with signed pointers, which drastically limits memory management capabilities on 32-bit systems.

Some random thoughts…

Right now, I think Julia’s target is still towards physics, numerical simulation community, etc. Julia is a free and high-performance alternative of Matlab. It may attract many users that had experience with that toolcahin. But for most computer scientists, Julia is not really superior compared to Python+numpy or C++.
If Julia is to step into data science, there must be some sort of performance lower bound. Maybe in the future, the compiler will allow users to enter a special mode where optimizations are turned off and code blocks are executed faster in Notebook environments.

Alan Xiang's Blog