Language Syntax - Julia Data Science

3.2 Language Syntax

Julia is a dynamic-typed language with a just-in-time compiler. This means that you don’t need to compile your program before you run it, like you would do in C++ or FORTRAN. Instead, Julia will take your code, guess types where necessary, and compile parts of code just before running it. Also, you don’t need to explicitly specify each type. Julia will guess types for you on the go.

The main differences between Julia and other dynamic languages such as R and Python are the following. First, Julia allows the user to specify type declarations. You already saw some types declarations in Why Julia? (Section 2): they are those double colons :: that sometimes come after variables. However, if you don’t want to specify the type of your variables or functions, Julia will gladly infer (guess) them for you.

Second, Julia allows users to define function behavior across many combinations of argument types via multiple dispatch. We also covered multiple dispatch in Section 2.3. We defined a different type behavior by defining new function signatures for argument types while using the same function name.

3.2.1 Variables

Variables are values that you tell the computer to store with a specific name, so that you can later recover or change its value. Julia has several types of variables but, in data science, we mostly use:

Integers: Int64
Real Numbers: Float64
Boolean: Bool
Strings: String

Integers and real numbers are stored by using 64 bits by default, that’s why they have the 64 suffix in the name of the type. If you need more or less precision, there are Int8 or Int128 types, for example, where higher means more precision. Most of the time, this won’t be an issue so you can just stick to the defaults.

We create new variables by writing the variable name on the left and its value in the right, and in the middle we use the = assignment operator. For example:

name = "Julia"
age = 9

Note that the return output of the last statement (age) was printed to the console. Here, we are defining two new variables: name and age. We can recover their values by typing the names given in the assignment:

name


Julia

If you want to define new values for an existing variable, you can repeat the steps in the assignment. Note that Julia will now override the previous value with the new one. Supposed, Julia’s birthday has passed and now it has turned 10:

age = 10

We can do the same with its name. Suppose that Julia has earned some titles due to its blazing speed. We would change the variable name to the new value:

name = "Julia Rapidus"


Julia Rapidus

We can also do operations on variables such as addition or division. Let’s see how old Julia is, in months, by multiplying age by 12:

12 * age

We can inspect the types of variables by using the typeof function:

typeof(age)

Int64

The next question then becomes: “What else can I do with integers?” There is a nice handy function methodswith that spits out every function available, along with its signature, for a certain type. Here, we will restrict the output to the first 5 rows:

first(methodswith(Int64), 5)

[1] AbstractFloat(x::Int64) @ Base float.jl:268
[2] Float16(x::Int64) @ Base float.jl:159
[3] Float32(x::Int64) @ Base float.jl:159
[4] Float64(x::Int64) @ Base float.jl:159
[5] Int64(x::Union{Bool, Int32, Int64, UInt16, UInt32, UInt64, UInt8, Int128, Int16, Int8, UInt128}) @ Core boot.jl:784

3.2.2 User-defined Types

Having variables around without any sort of hierarchy or relationships is not ideal. In Julia, we can define that kind of structured data with a struct (also known as a composite type). Inside each struct, you can specify a set of fields. They differ from the primitive types (e.g. integer and floats) that are by default defined already inside the core of Julia language. Since most structs are user-defined, they are known as user-defined types.

For example, let’s create a struct to represent scientific open source programming languages. We’ll also define a set of fields along with the corresponding types inside the struct:

struct Language
    name::String
    title::String
    year_of_birth::Int64
    fast::Bool
end

To inspect the field names you can use the fieldnames and pass the desired struct as an argument:

fieldnames(Language)

(:name, :title, :year_of_birth, :fast)

To use structs, we must instantiate individual instances (or “objects”), each with its own specific values for the fields defined inside the struct. Let’s instantiate two instances, one for Julia and one for Python:

julia = Language("Julia", "Rapidus", 2012, true)
python = Language("Python", "Letargicus", 1991, false)

Language("Python", "Letargicus", 1991, false)

One thing to note with structs is that we can’t change their values once they are instantiated. We can solve this with a mutable struct. Also, note that mutable objects will, generally, be slower and more error prone. Whenever possible, make everything immutable. Let’s create a mutable struct.

mutable struct MutableLanguage
    name::String
    title::String
    year_of_birth::Int64
    fast::Bool
end

julia_mutable = MutableLanguage("Julia", "Rapidus", 2012, true)

MutableLanguage("Julia", "Rapidus", 2012, true)

Suppose that we want to change julia_mutable’s title. Now, we can do this since julia_mutable is an instantiated mutable struct:

julia_mutable.title = "Python Obliteratus"

julia_mutable

MutableLanguage("Julia", "Python Obliteratus", 2012, true)

3.2.3 Boolean Operators and Numeric Comparisons

Now that we’ve covered types, we can move to boolean operators and numeric comparison.

We have three boolean operators in Julia:

!: NOT
&&: AND
||: OR

Here are a few examples with some of them:

!true


false

(false && true) || (!false)


true

(6 isa Int64) && (6 isa Real)


true

Regarding numeric comparison, Julia has three major types of comparisons:

Equality: either something is equal or not equal another
- == “equal”
- != or ≠ “not equal”
Less than: either something is less than or less than or equal to
- < “less than”
- <= or ≤ “less than or equal to”
Greater than: either something is greater than or greater than or equal to
- > “greater than”
- >= or ≥ “greater than or equal to”

Here are some examples:

1 == 1


true

1 >= 10


false

It evens works between different types:

1 == 1.0


true

We can also mix and match boolean operators with numeric comparisons:

(1 != 10) || (3.14 <= 2.71)


true

3.2.4 Functions

Now that we already know how to define variables and custom types as structs, let’s turn our attention to functions. In Julia, a function maps argument’s values to one or more return values. The basic syntax goes like this:

function function_name(arg1, arg2)
    result = stuff with the arg1 and arg2
    return result
end

The function declaration begins with the keyword function followed by the function name. Then, inside parentheses (), we define the arguments separated by a comma ,. Inside the function, we specify what we want Julia to do with the parameters that we supplied. All variables that we define inside a function are deleted after the function returns. This is nice because it is like an automatic cleanup. After all the operations in the function body are finished, we instruct Julia to return the final result with the return statement. Finally, we let Julia know that the function definition is finished with the end keyword.

There is also the compact assignment form:

f_name(arg1, arg2) = stuff with the arg1 and arg2

It is the same function as before but with a different, more compact, form. As a rule of thumb, when your code can fit easily on one line of up to 92 characters, then the compact form is suitable. Otherwise, just use the longer form with the function keyword. Let’s dive into some examples.

3.2.4.1 Creating new Functions

Let’s create a new function that adds numbers together:

function add_numbers(x, y)
    return x + y
end

add_numbers (generic function with 1 method)

Now, we can use our add_numbers function:

add_numbers(17, 29)

And it works also with floats:

add_numbers(3.14, 2.72)


5.86

Also, we can define custom behavior by specifying type declarations. Suppose that we want to have a round_number function that behaves differently if its argument is either a Float64 or Int64:

function round_number(x::Float64)
    return round(x)
end

function round_number(x::Int64)
    return x
end

round_number (generic function with 2 methods)

We can see that it is a function with multiple methods:

methods(round_number)

round_number(x::Int64)
     @ Main none:5

round_number(x::Float64)
     @ Main none:1

There is one issue: what happens if we want to round a 32-bit float Float32? Or a 8-bit integer Int8?

If you want something to function on all float and integer types, you can use an abstract type as the type signature, such as AbstractFloat or Integer:

function round_number(x::AbstractFloat)
    return round(x)
end

round_number (generic function with 3 methods)

Now, it works as expected with any float type:

x_32 = Float32(1.1)
round_number(x_32)

1.0f0

NOTE: We can inspect types with the supertypes and subtypes functions.

Let’s go back to our Language struct that we defined above. This is an example of multiple dispatch. We will extend the Base.show function that prints the output of instantiated types and structs.

By default, a struct has a basic output, which you saw above in the python case. We can define a new Base.show method to our Language type, so that we have some nice printing for our programming languages instances. We want to clearly communicate programming languages’ names, titles, and ages in years. The function Base.show accepts as arguments a IO type named io followed by the type you want to define custom behavior:

Base.show(io::IO, l::Language) = print(
    io, l.name, ", ",
    2021 - l.year_of_birth, " years old, ",
    "has the following titles: ", l.title
)

Now, let’s see how python will output:

python

Python, 30 years old, has the following titles: Letargicus

3.2.4.2 Multiple Return Values

A function can, also, return two or more values. See the new function add_multiply below:

function add_multiply(x, y)
    addition = x + y
    multiplication = x * y
    return addition, multiplication
end

add_multiply (generic function with 1 method)

In that case, we can do two things:

We can, analogously as the return values, define two variables to hold the function return values, one for each return value:
```
return_1, return_2 = add_multiply(1, 2)
return_2
```
```
2
```
Or we can define just one variable to hold the function’s return values and access them with either first or last:
```
all_returns = add_multiply(1, 2)
last(all_returns)
```
```
2
```

3.2.4.3 Keyword Arguments

Some functions can accept keyword arguments instead of positional arguments. These arguments are just like regular arguments, except that they are defined after the regular function’s arguments and separated by a semicolon ;. For example, let’s define a logarithm function that by default uses base \(e\) (2.718281828459045) as a keyword argument. Note that, here, we are using the abstract type Real so that we cover all types derived from Integer and AbstractFloat, being both themselves subtypes of Real:

AbstractFloat <: Real && Integer <: Real


true

function logarithm(x::Real; base::Real=2.7182818284590)
    return log(base, x)
end

logarithm (generic function with 1 method)

It works without specifying the base argument as we supplied a default argument value in the function declaration:

logarithm(10)


2.3025850929940845

And also with the keyword argument base different from its default value:

logarithm(10; base=2)


3.3219280948873626

3.2.4.4 Anonymous Functions

Often we don’t care about the name of the function and want to quickly make one. What we need are anonymous functions. They are used a lot in Julia’s data science workflow. For example, when using DataFrames.jl (Section 4) or Makie.jl (Section 6), sometimes we need a temporary function to filter data or format plot labels. That’s when we use anonymous functions. They are especially useful when we don’t want to create a function, and a simple in-place statement would be enough.

The syntax is simple. We use the -> operator. On the left of -> we define the parameter name. And on the right of -> we define what operations we want to perform on the parameter that we defined on the left of ->. Here is an example. Suppose that we want to undo the log transformation by using an exponentiation:

map(x -> 2.7182818284590^x, logarithm(2))

2.0

Here, we are using the map function to conveniently map the anonymous function (first argument) to logarithm(2) (the second argument). As a result, we get back the same number, because logarithm and exponentiation are inverse (at least in the base that we’ve chosen – 2.7182818284590)

3.2.5 Conditional If-Else-Elseif

In most programming languages, the user is allowed to control the computer’s flow of execution. Depending on the situation, we want the computer to do one thing or another. In Julia we can control the flow of execution with if, elseif, and else keywords. These are known as conditional statements.

The if keyword prompts Julia to evaluate an expression and, depending on whether it’s true or false, execute certain portions of code. We can compound several if conditions with the elseif keyword for complex control flow. Finally, we can define an alternative portion to be executed if anything inside the if or elseifs is evaluated to true. This is the purpose of the else keyword. Finally, like all the previous keyword operators that we saw, we must tell Julia when the conditional statement is finished with the end keyword.

Here’s an example with all the if-elseif-else keywords:

a = 1
b = 2

if a < b
    "a is less than b"
elseif a > b
    "a is greater than b"
else
    "a is equal to b"
end


a is less than b

We can even wrap this in a function called compare:

function compare(a, b)
    if a < b
        "a is less than b"
    elseif a > b
        "a is greater than b"
    else
        "a is equal to b"
    end
end

compare(3.14, 3.14)

a is equal to b

3.2.6 For Loop

The classical for loop in Julia follows a similar syntax as the conditional statements. You begin with a keyword, in this case for. Then, you specify what Julia should “loop” for, i.e., a sequence. Also, like everything else, you must finish with the end keyword.

So, to make Julia print every number from 1 to 10, you can use the following for loop:

for i in 1:10
    println(i)
end

3.2.7 While Loop

The while loop is a mix of the previous conditional statements and for loops. Here, the loop is executed every time the condition is true. The syntax follows the same form as the previous one. We begin with the keyword while, followed by a statement that evaluates to true or false. As usual, you must end with the end keyword.

Here’s an example:

n = 0

while n < 3
    global n += 1
end

n

As you can see, we have to use the global keyword. This is because of variable scope. Variables defined inside conditional statements, loops, and functions exist only inside them. This is known as the scope of the variable. Here, we had to tell Julia that the n inside while loop is in the global scope with the global keyword.

Finally, we also used the += operator which is a nice shorthand for n = n + 1.

3.1 Development Environments ← → 3.3 Native Data Structures

Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso