2.3 What Julia Aims to Accomplish?

NOTE: In this section we will explain the details of what makes Julia shine as a programming language. If it becomes too technical for you, you can skip and go straight to Section 4 to learn about tabular data with DataFrames.jl.

The Julia programming language (Bezanson et al., 2017) is a relatively new language, first released in 2012 and aims to be both easy and fast. It “runs like C9 but reads like Python” (Perkel, 2019). It was made for scientific computing, capable of handling large amounts of data and computation while still being fairly easy to manipulate, create and protype code.

The creators of Julia explained why they created Julia in a 2012 blogpost. They said:

We are greedy: we want more. We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

Most users are attracted to Julia because of the superior speed. After all, Julia is a member of a prestigious and exclusive club. The petaflop club is comprised of languages who can exceed speeds of one petaflop10 per second at peak performance. Currently only C, C++, Fortran and Julia belong to the petaflop club.

But, speed is not all that Julia can deliver. The ease of use, unicode support and a language that makes code sharing effortless is some of Julia’s features. We’ll address all those features in this section, but we want to focus on the Julia code sharing feature for now.

Julia ecosystem of packages is something unique. It enables not only code sharing but also allows sharing of user-created types. For example, Python’s pandas uses its own Datetime type to handle dates. The same with R tidyverse’s lubridate package, which also defines its own datetime type to handle dates. Julia doesn’t need any of this, it has all the date stuff already baked onto its standard library. This means that other packages don’t have to worry about dates. They just have to extend the dates type to new functionalities by defining new functions and do not need to define new types. Julia’s Dates module can do amazing stuff, but we are getting ahead of ourselves now. Let’s talk about the other Julia’s features.

2.3.1 Julia Versus Other Programming Languages

In Figure 1, a highly opinionated representation is shown that divides the main open source and scientific computing languages in a 2x2 diagram with two axes: Slow-Fast and Easy-Hard. We’ve omitted closed source languages because there are many benefits to allowing other people to run your code for free as well as being able to inspect the source code in case of issues.

We’ve put C++ and FORTRAN in the hard and fast quadrant. Being static languages that needs compilation, type checking and other professional care and attention, they are really hard to learn and slow to prototype. The advantage is that they are really fast languages.

R and Python goes into the easy and slow quadrant. They are dynamic languages that are not compiled and executes in runtime. Because of this, they are really easy to learn and fast to prototype. Of course, this come with a disadvantage: they are really slow languages.

Julia is the only language in the easy and fast quadrant. We don’t know any other serious language that would want to be hard and slow, so this quadrant is left empty.

Figure 1: Scientific Computing Language Comparisons: logos for FORTRAN, C++, Python, R and Julia.

Julia is fast! Very fast! It was designed for speed from the beginning. It accomplishes this by multiple dispatch. Basically, the idea is to generate very efficient LLVM code. LLVM code, also known as LLVM instructions, are very low-level, that is, very close to the actual operations that your computer is executing. So, in essence, Julia converts your hand written and easy to read code to LLVM machine code which is very hard to read for humans, but easy to read for computers. For example, if you define a function taking one argument and pass an integer into the function, then Julia will create a specialized MethodInstance. The next time that you pass an integer to the function, Julia will lookup the MethodInstance that was created earlier and refer execution to that. Now, the great trick is that you can also do this inside a function that calls a function. For example, if some data type is passed into function f and f calls function g and the data types passed to g are known and always the same, then the generated function g can be hardcoded into function f! This means that Julia doesn’t even have to lookup MethodInstances anymore, and the code can run very efficiently. The tradeoff, here, is that there are cases where earlier assumptions about the hardcoded MethodInstances are invalidated. Then, the MethodInstance has to be recreated which takes time. Also, the tradeoff is that it takes time to infer what can be hardcoded and what not. This explains why it can often take very long before Julia does the first thing: in the background, it is optimizing your code.

The compiler in turns does what it does best: optimizes machine code11. You can find benchmarks for Julia and several other languages here. Figure 2 was taken from Julia’s website benchmarks section12. As you can see Julia is indeed fast.

Figure 2: Julia versus other programming languages.

We really believe in Julia. Otherwise, we wouldn’t be writing this book. We think that Julia is the future in scientific computing and scientific data analysis. It enables the user to develop rapid and powerful code with a simple syntax. Usually, researchers develop code by prototyping using a very easy, but slow, language. Once the code is assured to run correct and fulfill its goal, then begins the process of converting the code to a fast, but hard, language. This is known as the “Two-Language Problem” and we discuss next.

2.3.2 The Two-Language Problem

The “Two-Language Problem” is a very typical situation in scientific computing where a researcher devises an algorithm or a solution to tackle a desired problem or analysis at hand. Then, the solution is prototyped in an easy to code language (like Python or R). If the prototype works, the researcher would code in a fast language that would not be easy to prototype (C++ or FORTRAN). Thus, we have two languages involved in the process of developing a new solution. One which is easy to prototype but is not suited for implementation (mostly due to being slow). And another which is not so easy to code, and consequently not easy to prototype, but suited for implementation because it is fast. Julia avoids such situations by being the same language that you prototype (ease of use) and implement the solution (speed).

Also, Julia lets you use unicode characters as variables or parameters. This means no more using sigma or sigma_i, and instead just use \(σ\) or \(σᵢ\) as you would in mathematical notation. When you see code for an algorithm or for a mathematical equation, you see almost the same notation and idioms. We call this feature “One-To-One Code and Math Relation” which is a powerful feature.

We think that the “Two-Language problem” and the “One-To-One Code and Math Relation” are best described by one of the creators of Julia, Alan Edelman, in a TEDx Talk (TEDx Talks, 2020) (if you are reading the printed book or a static PDF please click on the link to go the video or check the citation):

2.3.3 Multiple Dispatch

To explain multiple dispatch, we’ll give an illustrative example in Python. Suppose that you want to have different types of researcher that will inherit from a “base” class Researcher. The base class Researcher would define the initial common values for every derived class, namely name and age. These would go inside the default constructor method __init__:

class Researcher:
    def __init__(self, name: str, age: int):
        self.name = name
        self.age = age

Now let us define a Linguist class that will inherit from the Researcher class. We will also define a method citation that returns the citation style that is mostly used in the linguistics research field.

class Linguist(Researcher):
    def citation(self):
        return "APA"

We do the same for the ComputerScientist class, but with a different citation style:

class ComputerScientist(Researcher):
   def citation(self):
       return "IEEE"

Finally, let’s instantiate our two researchers, Noam Chomsky a linguist and Judea Pearl a computer scientist with their respective ages:

noam = Linguist("Noam Chomsky", 92)
judea = ComputerScientist("Judea Pearl", 84)

Now, suppose you want to define a function that will have different behaviors depending on the argument’s types. For example, a linguist researcher might approach a computer scientist researcher and ask him to collaborate on a new paper. The approach would be different if the situation were the opposite: a computer scientist researcher approaching the linguist researcher. Below, you’ll see that we defined two functions that should behave differently in both approaches:

def approaches(li: Linguist, cs: ComputerScientist):
   print(f"Hey {cs.name}, wanna do a paper together? We need to use {li.citation()} style.")

def approaches(cs:ComputerScientist, li: Linguist):
    print(f"Hey {li.name}, wanna do a paper together? We need to use {cs.citation()} style.")

Now let’s say Noam Chomsky approaches Judea Pearl with a paper idea:

approaches(noam, judea)
Hey Judea Pearl, wanna do a paper? We need to use APA style.

That was not what judea as a Linguist type would say to noam, a ComputerScientist type. This is single dispatch and is the default feature available on most object-oriented languages, like Python or C++. Single dispatch just acts on the first argument of a function. Since both of our researchers noam and judea are instantiated as types inherited from the same base type Researcher we cannot implement what we are trying to do in Python. You would need to change your approach with a substantial loss of simplicity. Specifically, you would probably need to create different functions with different names.

Now, let’s do this in Julia.

In Julia, we don’t have classes but we have structures (struct) that are meant to be “structured data.” They define the kind of information that is embedded in the structure, that is, a set of fields (i.e. “properties” or “attributes” in other languages), and then individual instances (or “objects”) can be produced each with its own specific values for the fields defined by the structure. They differ from the primitive types (e.g. Int64 and Float64) that are defined in the core of the Julia language. Thus, they are known as user-defined types. The user can only create new abstract types or structs. These are known as composite types. In Julia, all structs are final and may only have abstract types as their supertypes.

First we’ll create an abstract type named Researcher.

abstract type Researcher end

We proceed, similar as before, by creating two derived structs from the Researcher abstract type. Note that the <: operator is the subtype operator to assign that a struct or type is a subtype of another struct or type, which, in turn, would become a supertype (and we have the analogous >: operator). Next, we create two field names, one for the researcher name and the other for age. They are represented as strings and 64-bit integers, respectively:

struct Linguist <: Researcher
struct ComputerScientist <: Researcher

The final step is to define two new functions that will behave differently depending on which derived struct of Researcher are the first or second argument. We also use $ for string interpolation of the researcher’s name:

approaches(li::Linguist, cs::ComputerScientist) = "Hey $(cs.name), wanna do a paper? We need to use APA style."

approaches(cs::ComputerScientist, li::Linguist) = "Hey $(li.name), wanna do a paper? We need to use IEEE style."
approaches (generic function with 2 methods)

Finally, let’s instantiate our two researchers, noam and judea, as we did before in the Python case:

noam = Linguist("Noam Chomsky", 92)
judea = ComputerScientist("Judea Pearl", 84)

Again, let’s see what Noam Chomsky will say when he approaches Judea Pearl with a paper idea:

approaches(noam, judea)

Hey Judea Pearl, wanna do a paper? We need to use APA style.

Perfect! It behaves just as we wanted! This is multiple dispatch and it is an important feature in Julia. Multiple dispatch acts on all arguments of a function and defines function behavior based on all argument’s types.

Multiple dispatch is a powerful feature that allows us also to extend existing functions or to define custom and complex behavior to new types. To show how this works, we’ll use another example. Suppose that you want to define two new structs for two different animals. For simplicity, we won’t be adding fields for the structs:

struct fox end
struct chicken end

Next, we want to define addition for both the fox and chicken types. We proceed by defining a new function signature of the + operator from the Base module of Julia13:

import Base: +
+(F::fox, C::chicken) = "trouble"
+(C1::chicken, C2::chicken) = "safe"
+ (generic function with 368 methods)

Now, let’s call addition with the + sign on instantiated fox and chicken objects:

my_fox = fox()
my_chicken = chicken()
my_fox + my_chicken

And, as expected, adding two chicken objects together signals that they are safe:

chicken_1 = chicken()
chicken_2 = chicken()
chicken_1 + chicken_2

This is the power of multiple dispatch: we don’t need everything from scratch for our custom-defined user types. If you are excited as much as we are by multiple dispatch, here are two more in-depth examples. The first is a fast and elegant implementation of a one-hot vector by Storopoli (2021). The second is an interview of Christopher Rackauckas at Tanmay Bakshi YouTube’s Channel (see from time 35:07 onwards) (tanmay bakshi, 2021). Chris mentions that, while using DifferentialEquations.jl, a package that he developed and currently maintains, an user filed an issue that his GPU-based quaternion ODE solver didn’t work. Chris was quite surprised by this request since he would never have expected that someone would combine GPU computations with quaternions and solving ODEs. He was even more surprised to discover that the user made a small mistake and that it all worked. Most of the merit is due to multiple dispatch and high user code/type sharing.

To conclude, we think that multiple dispatch is best explained by one of the creators of Julia, Stefan Karpinski, at JuliaCon 2019 (The Julia Programming Language, 2019) (if you are reading the printed book or a static PDF please click on the link to go the video or check the citation):

  1. 9. sometimes even faster than C.↩︎

  2. 10. a petaflop is one thousand trillion, or one quadrillion, operations per second.↩︎

  3. 11. if you like to learn more about how Julia is designed you should definitely check Bezanson et al. (2017).↩︎

  4. 12. please note that the Julia results depicted above do not include compile time.↩︎

  5. 13. this is an example for teaching purposes. Doing something similar as this example will result in many method invalidations and is, therefore, not a good idea.↩︎

CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer and Lazaro Alonso