3.3 Native Data Structures

Julia has several native data structures. They are abstractions of data that represent some form of structured data. We will cover the most used ones. They hold homogeneous or heterogeneous data. Since they are collections, they can be looped over with the for loops.

We will cover String, Tuple, NamedTuple, UnitRange, Arrays, Pair, Dict, Symbol.

When you stumble across a data structure in Julia, you can find methods that accept it as an argument with the methodswith function. In Julia, the distinction between methods and functions is as follows. Every function can have multiple methods like we have shown earlier. The methodswith function is nice to have in your bag of tricks. Let’s see what we can do with a String for example:

first(methodswith(String), 5)
[1] String(s::String) @ Core boot.jl:375
[2] Symbol(s::String) @ Core boot.jl:513
[3] ==(x::String, y::PosLenString) @ WeakRefStrings ~/.julia/packages/WeakRefStrings/31nkb/src/poslenstrings.jl:72
[4] ==(a::String, b::String) @ Base strings/string.jl:143
[5] ==(x::String, y::WeakRefStrings.WeakRefString{T}) where T @ WeakRefStrings ~/.julia/packages/WeakRefStrings/31nkb/src/WeakRefStrings.jl:47

3.3.1 Broadcasting Operators and Functions

Before we dive into data structures, we need to talk about broadcasting (also known as vectorization) and the “dot” operator ..

We can broadcast mathematical operations like * (multiplication) or + (addition) using the dot operator. For example, broadcasted addition would imply a change from + to .+:

[1, 2, 3] .+ 1
[2, 3, 4]

It also works automatically with functions. (Technically, the mathematical operations, or infix operators, are also functions, but that is not so important to know.) Remember our logarithm function?

logarithm.([1, 2, 3])
[0.0, 0.6931471805599569, 1.0986122886681282]

3.3.2 Functions with a bang !

It is a Julia convention to append a bang ! to names of functions that modify one or more of their arguments. This convention warns the user that the function is not pure, i.e., that it has side effects. A function with side effects is useful when you want to update a large data structure or variable container without having all the overhead from creating a new instance.

For example, we can create a function that adds 1 to each element in a vector V:

function add_one!(V)
    for i in eachindex(V)
        V[i] += 1
    return nothing
my_data = [1, 2, 3]


[2, 3, 4]

3.3.3 String

Strings are represented delimited by double quotes:

typeof("This is a string")

We can also write a multiline string:

text = "
This is a big multiline string.
As you can see.
It is still a String to Julia.

This is a big multiline string.
As you can see.
It is still a String to Julia.

But it is usually clearer to use triple quotation marks:

s = """
    This is a big multiline string with a nested "quotation".
    As you can see.
    It is still a String to Julia.

This is a big multiline string with a nested "quotation".
As you can see.
It is still a String to Julia.

When using triple-backticks, the indentation and newline at the start is ignored by Julia. This improves code readability because you can indent the block in your source code without those spaces ending up in your string. String Concatenation

A common string operation is string concatenation. Suppose that you want to construct a new string that is the concatenation of two or more strings. This is accomplished in Julia either with the * operator or the join function. This symbol might sound like a weird choice and it actually is. For now, many Julia codebases are using this symbol, so it will stay in the language. If you’re interested, you can read a discussion from 2015 about it at https://github.com/JuliaLang/julia/issues/11030.

hello = "Hello"
goodbye = "Goodbye"

hello * goodbye


As you can see, we are missing a space between hello and goodbye. We could concatenate an additional " " string with the *, but that would be cumbersome for more than two strings. That’s where the join function comes in handy. We just pass as arguments the strings inside the brackets [] and the separator:

join([hello, goodbye], " ")

Hello Goodbye String Interpolation

Concatenating strings can be convoluted. We can be much more expressive with string interpolation. It works like this: you specify whatever you want to be included in your string with the dollar sign $. Here’s the example before but now using interpolation:

"$hello $goodbye"

Hello Goodbye

It even works inside functions. Let’s revisit our test function from Section 3.2.5:

function test_interpolated(a, b)
    if a < b
        "$a is less than $b"
    elseif a > b
        "$a is greater than $b"
        "$a is equal to $b"

test_interpolated(3.14, 3.14)

3.14 is equal to 3.14 String Manipulations

There are several functions to manipulate strings in Julia. We will demonstrate the most common ones. Also, note that most of these functions accept a Regular Expression (regex) as arguments. We won’t cover Regular Expressions in this book, but you are encouraged to learn about them, especially if most of your work uses textual data.

First, let us define a string for us to play around with:

julia_string = "Julia is an amazing open source programming language"

Julia is an amazing open source programming language
  1. contains, startswith and endswith: A conditional (returns either true or false) if the second argument is a:

    • substring of the first argument

      contains(julia_string, "Julia")
    • prefix of the first argument

      startswith(julia_string, "Julia")
    • suffix of the first argument

      endswith(julia_string, "Julia")
  2. lowercase, uppercase, titlecase and lowercasefirst:

    julia is an amazing open source programming language
    Julia Is An Amazing Open Source Programming Language
    julia is an amazing open source programming language
  3. replace: introduces a new syntax, called the Pair

    replace(julia_string, "amazing" => "awesome")
    Julia is an awesome open source programming language
  4. split: breaks up a string by a delimiter:

    split(julia_string, " ")
    SubString{String}["Julia", "is", "an", "amazing", "open", "source", "programming", "language"] String Conversions

Often, we need to convert between types in Julia. To convert a number to a string we can use the string function:

my_number = 123

Sometimes, we want the opposite: convert a string to a number. Julia has a handy function for that: parse.

typeof(parse(Int64, "123"))

Sometimes, we want to play safe with these conversions. That’s when tryparse function steps in. It has the same functionality as parse but returns either a value of the requested type, or nothing. That makes tryparse handy when we want to avoid errors. Of course, you would need to deal with all those nothing values afterwards.

tryparse(Int64, "A very non-numeric string")

3.3.4 Tuple

Julia has a data structure called tuple. They are really special in Julia because they are often used in relation to functions. Since functions are an important feature in Julia, every Julia user should know the basics of tuples.

A tuple is a fixed-length container that can hold multiple different types. A tuple is an immutable object, meaning that it cannot be modified after instantiation. To construct a tuple, use parentheses () to delimit the beginning and end, along with commas , as delimiters between values:

my_tuple = (1, 3.14, "Julia")
(1, 3.14, "Julia")

Here, we are creating a tuple with three values. Each one of the values is a different type. We can access them via indexing. Like this:



We can also loop over tuples with the for keyword. And even apply functions to tuples. But we can never change any value of a tuple since they are immutable.

Remember functions that return multiple values back in Section Let’s inspect what our add_multiply function returns:

return_multiple = add_multiply(1, 2)
Tuple{Int64, Int64}

This is because return a, b is the same as return (a, b):

1, 2
(1, 2)

So, now you can see why they are often related.

One more thing about tuples. When you want to pass more than one variable to an anonymous function, guess what you would need to use? Once again: tuples!

map((x, y) -> x^y, 2, 3)


Or, even more than two arguments:

map((x, y, z) -> x^y + z, 2, 3, 1)


3.3.5 Named Tuple

Sometimes, you want to name the values in tuples. That’s when named tuples comes in. Their functionality is pretty much same as tuples: they are immutable and can hold any type of value.

The construction of named tuples is slightly different from that of tuples. You have the familiar parentheses () and the comma , value separator. But now you name the values:

my_namedtuple = (i=1, f=3.14, s="Julia")
(i = 1, f = 3.14, s = "Julia")

We can access named tuple’s values via indexing like regular tuples or, alternatively, access by their names with the .:



To finish our discussion of named tuples, there is one important quick syntax that you’ll see a lot in Julia code. Often Julia users create a named tuple by using the familiar parenthesis () and commas ,, but without naming the values. To do so you begin the named tuple construction by specifying first a semicolon ; before the values. This is especially useful when the values that would compose the named tuple are already defined in variables or when you want to avoid long lines:

i = 1
f = 3.14
s = "Julia"

my_quick_namedtuple = (; i, f, s)
(i = 1, f = 3.14, s = "Julia")

3.3.6 Ranges

A range in Julia represents an interval between start and stop boundaries. The syntax is start:stop:


As you can see, our instantiated range is of type UnitRange{T} where T is the type inside the UnitRange:


And, if we gather all the values, we get:

[x for x in 1:10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

We can also construct ranges for other types:

StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}

Sometimes, we want to change the default interval step size behavior. We can do that by adding a step size in the range syntax start:step:stop. For example, suppose we want a range of Float64 from 0 to 1 with steps of size 0.2:


If you want to “materialize” a range into a collection, you can use the function collect:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

We have an array of the type specified in the range between the boundaries that we’ve set. Speaking of arrays, let’s talk about them.

3.3.7 Array

In its most basic form, arrays hold multiple objects. For example, they can hold multiple numbers in one-dimension:

myarray = [1, 2, 3]
[1, 2, 3]

Most of the time you would want arrays of a single type for performance issues, but note that they can also hold objects of different types:

myarray = ["text", 1, :symbol]
Any["text", 1, :symbol]

They are the “bread and butter” of data scientist, because arrays are what underlies most of data manipulation and data visualization workflows.

Therefore, Arrays are an essential data structure. Array Types

Let’s start with array types. There are several, but we will focus on the two most used in data science:

Note here that T is the type of the underlying array. So, for example, Vector{Int64} is a Vector in which all elements are Int64s, and Matrix{AbstractFloat} is a Matrix in which all elements are subtypes of AbstractFloat.

Most of the time, especially when dealing with tabular data, we are using either one- or two-dimensional arrays. They are both Array types for Julia. But, we can use the handy aliases Vector and Matrix for clear and concise syntax. Array Construction

How do we construct an array? In this section, we start by constructing arrays in a low-level way. This can be necessary to write high performing code in some situations. However, in most situations, this is not necessary, and we can safely use more convenient methods to create arrays. These more convenient methods will be described later in this section.

The low-level constructor for Julia arrays is the default constructor. It accepts the element type as the type parameter inside the {} brackets and inside the constructor you’ll pass the element type followed by the desired dimensions. It is common to initialize vector and matrices with undefined elements by using the undef argument for type. A vector of 10 undef Float64 elements can be constructed as:

my_vector = Vector{Float64}(undef, 10)
[6.9498108744618e-310, 6.94984512327103e-310, 6.9498108744689e-310, 6.9499201217594e-310, 6.94984746106397e-310, 6.9499210339702e-310, 1.56683e-319, NaN, 2.1219957915e-314, 6.949804534423e-310]

For matrices, since we are dealing with two-dimensional objects, we need to pass two dimension arguments inside the constructor: one for rows and another for columns. For example, a matrix with 10 rows and 2 columns of undef elements can be instantiated as:

my_matrix = Matrix{Float64}(undef, 10, 2)
10×2 Matrix{Float64}:
 4.66839e-313  1.86736e-312
 0.0           1.78248e-312
 5.94159e-313  1.78248e-312
 0.0           1.71882e-312
 8.48798e-314  0.0
 7.21479e-313  0.0
 8.70018e-313  3.8e-322
 2.08e-322     1.93102e-312
 1.31564e-312  1.99468e-312
 1.54906e-312  9.76118e-313

We also have some syntax aliases for the most common elements in array construction:

For other elements, we can first instantiate an array with undef elements and use the fill! function to fill all elements of an array with the desired element. Here’s an example with 3.14 (\(\pi\)):

my_matrix_π = Matrix{Float64}(undef, 2, 2)
fill!(my_matrix_π, 3.14)
2×2 Matrix{Float64}:
 3.14  3.14
 3.14  3.14

We can also create arrays with array literals. For example, here’s a 2x2 matrix of integers:

[[1 2]
 [3 4]]
2×2 Matrix{Int64}:
 1  2
 3  4

Array literals also accept a type specification before the [] brackets. So, if we want the same 2x2 array as before but now as floats, we can do so:

Float64[[1 2]
        [3 4]]
2×2 Matrix{Float64}:
 1.0  2.0
 3.0  4.0

It also works for vectors:

Bool[0, 1, 0, 1]
Bool[0, 1, 0, 1]

You can even mix and match array literals with the constructors:

[ones(Int, 2, 2) zeros(Int, 2, 2)]
2×4 Matrix{Int64}:
 1  1  0  0
 1  1  0  0
[zeros(Int, 2, 2)
 ones(Int, 2, 2)]
4×2 Matrix{Int64}:
 0  0
 0  0
 1  1
 1  1
[ones(Int, 2, 2) [1; 2]
 [3 4]            5]
3×3 Matrix{Int64}:
 1  1  1
 1  1  2
 3  4  5

Another powerful way to create an array is to write an array comprehension. This way of creating arrays is better in most cases: it avoids loops, indexing, and other error-prone operations. You specify what you want to do inside the [] brackets. For example, say we want to create a vector of squares from 1 to 10:

[x^2 for x in 1:10]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

They also support multiple inputs:

[x*y for x in 1:10 for y in 1:2]
[1, 2, 2, 4, 3, 6, 4, 8, 5, 10, 6, 12, 7, 14, 8, 16, 9, 18, 10, 20]

And conditionals:

[x^2 for x in 1:10 if isodd(x)]
[1, 9, 25, 49, 81]

As with array literals, you can specify your desired type before the [] brackets:

Float64[x^2 for x in 1:10 if isodd(x)]
[1.0, 9.0, 25.0, 49.0, 81.0]

Finally, we can also create arrays with concatenation functions. Concatenation is a standard term in computer programming and means “to chain together”. For example, we can concatenate strings with “aa” and “bb” to get “aabb”:

"aa" * "bb"


And, we can concatenate arrays to create new arrays: Array Inspection

Once we have arrays, the next logical step is to inspect them. There are a lot of handy functions that allow the user to have an insight into any array.

It is most useful to know what type of elements are inside an array. We can do this with eltype:


After knowing its types, one might be interested in array dimensions. Julia has several functions to inspect array dimensions: Array Indexing and Slicing

Sometimes, we want to inspect only certain parts of an array. This is called indexing and slicing. If you want a particular observation of a vector, or a row or column of a matrix, you’ll probably need to index an array.

First, we will create an example vector and matrix to play around:

my_example_vector = [1, 2, 3, 4, 5]

my_example_matrix = [[1 2 3]
                     [4 5 6]
                     [7 8 9]]

Let’s start with vectors. Suppose that you want the second element of a vector. You append [] brackets with the desired index inside:



The same syntax follows with matrices. But, since matrices are 2-dimensional arrays, we have to specify both rows and columns. Let’s retrieve the element from the second row (first dimension) and first column (second dimension):

my_example_matrix[2, 1]


Julia also has conventional keywords for the first and last elements of an array: begin and end. For example, the second to last element of a vector can be retrieved as:



This also works for matrices. Let’s retrieve the element of the last row and second column:

my_example_matrix[end, begin+1]


Often, we are not only interested in just one array element, but in a whole subset of array elements. We can accomplish this by slicing an array. It uses the same index syntax, but with the added colon : to denote the boundaries that we are slicing through the array. For example, suppose we want to get the 2nd to 4th element of a vector:

[2, 3, 4]

We could do the same with matrices. Particularly with matrices if we want to select all elements in a following dimension we can do so with just a colon :. For example, to get all the elements in the second row:

my_example_matrix[2, :]
[4, 5, 6]

You can interpret this with something like “take the 2nd row and all the columns”.

It also supports begin and end:

my_example_matrix[begin+1:end, end]
[6, 9] Array Manipulations

There are several ways we could manipulate an array. The first would be to manipulate a singular element of the array. We just index the array by the desired element and proceed with an assignment =:

my_example_matrix[2, 2] = 42
3×3 Matrix{Int64}:
 1   2  3
 4  42  6
 7   8  9

Or, you can manipulate a certain subset of elements of the array. In this case, we need to slice the array and then assign with =:

my_example_matrix[3, :] = [17, 16, 15]
3×3 Matrix{Int64}:
  1   2   3
  4  42   6
 17  16  15

Note that we had to assign a vector because our sliced array is of type Vector:

typeof(my_example_matrix[3, :])
Vector{Int64} (alias for Array{Int64, 1})

The second way we could manipulate an array is to alter its shape. Suppose that you have a 6-element vector and you want to make it a 3x2 matrix. You can do this with reshape, by using the array as the first argument and a tuple of dimensions as the second argument:

six_vector = [1, 2, 3, 4, 5, 6]
three_two_matrix = reshape(six_vector, (3, 2))
3×2 Matrix{Int64}:
 1  4
 2  5
 3  6

You can convert it back to a vector by specifying a tuple with only one dimension as the second argument:

reshape(three_two_matrix, (6, ))
[1, 2, 3, 4, 5, 6]

The third way we could manipulate an array is to apply a function over every array element. This is where the “dot” operator ., also known as broadcasting, comes in.

3×3 Matrix{Float64}:
 0.0      0.693147  1.09861
 1.38629  3.73767   1.79176
 2.83321  2.77259   2.70805

The dot operator in Julia is extremely versatile. You can even use it to broadcast infix operators:

my_example_matrix .+ 100
3×3 Matrix{Int64}:
 101  102  103
 104  142  106
 117  116  115

An alternative to broadcasting a function over a vector is to use map:

map(logarithm, my_example_matrix)
3×3 Matrix{Float64}:
 0.0      0.693147  1.09861
 1.38629  3.73767   1.79176
 2.83321  2.77259   2.70805

For anonymous functions, map is usually more readable. For example,

map(x -> 3x, my_example_matrix)
3×3 Matrix{Int64}:
  3    6   9
 12  126  18
 51   48  45

is quite clear. However, the same broadcast looks as follows:

(x -> 3x).(my_example_matrix)
3×3 Matrix{Int64}:
  3    6   9
 12  126  18
 51   48  45

Next, map works with slicing:

map(x -> x + 100, my_example_matrix[:, 3])
[103, 106, 115]

Finally, sometimes, and specially when dealing with tabular data, we want to apply a function over all elements in a specific array dimension. This can be done with the mapslices function. Similar to map, the first argument is the function and the second argument is the array. The only change is that we need to specify the dims argument to flag what dimension we want to transform the elements.

For example, let’s use mapslices with the sum function on both rows (dims=1) and columns (dims=2):

# rows
mapslices(sum, my_example_matrix; dims=1)
1×3 Matrix{Int64}:
 22  60  24
# columns
mapslices(sum, my_example_matrix; dims=2)
3×1 Matrix{Int64}:
 48 Array Iteration

One common operation is to iterate over an array with a for loop. The regular for loop over an array returns each element.

The simplest example is with a vector.

simple_vector = [1, 2, 3]

empty_vector = Int64[]

for i in simple_vector
    push!(empty_vector, i + 1)

[2, 3, 4]

Sometimes, you don’t want to loop over each element, but actually over each array index. We can use the eachindex function combined with a for loop to iterate over each array index.

Again, let’s show an example with a vector:

forty_twos = [42, 42, 42]

empty_vector = Int64[]

for i in eachindex(forty_twos)
    push!(empty_vector, i)

[1, 2, 3]

In this example, the eachindex(forty_twos) returns the indices of forty_twos, namely [1, 2, 3].

Similarly, we can iterate over matrices. The standard for loop goes first over columns then over rows. It will first traverse all elements in column 1, from the first row to the last row, then it will move to column 2 in a similar fashion until it has covered all columns.

For those familiar with other programming languages: Julia, like most scientific programming languages, is “column-major”. Column-major means that the elements in the column are stored next to each other in memory13. This also means that iterating over elements in a column is much quicker than over elements in a row.

Ok, let’s show this in an example:

column_major = [[1 3]
                [2 4]]

row_major = [[1 2]
             [3 4]]

If we loop over the vector stored in column-major order, then the output is sorted:

indexes = Int64[]

for i in column_major
    push!(indexes, i)

[1, 2, 3, 4]

However, the output isn’t sorted when looping over the other matrix:

indexes = Int64[]

for i in row_major
    push!(indexes, i)

[1, 3, 2, 4]

It is often better to use specialized functions for these loops:

3.3.8 Pair

Compared to the huge section on arrays, this section on pairs will be brief. Pair is a data structure that holds two objects (which typically belong to each other). We construct a pair in Julia using the following syntax:

my_pair = "Julia" => 42
"Julia" => 42

The elements are stored in the fields first and second.




But, in most cases, it’s easier use first and last14:




Pairs will be used a lot in data manipulation and data visualization since both DataFrames.jl (Section 4) or Makie.jl (Section 6) take objects of type Pair in their main functions. For example, with DataFrames.jl we’re going to see that :a => :b can be used to rename the column :a to :b.

3.3.9 Dict

If you understood what a Pair is, then Dict won’t be a problem. For all practical purposes, Dicts are mappings from keys to values. By mapping, we mean that if you give a Dict some key, then the Dict can tell you which value belongs to that key. keys and values can be of any type, but usually keys are strings.

There are two ways to construct Dicts in Julia. The first is by passing a vector of tuples as (key, value) to the Dict constructor:

name2number_map = Dict([("one", 1), ("two", 2)])
Dict{String, Int64} with 2 entries:
  "two" => 2
  "one" => 1

There is a more readable syntax based on the Pair type described above. You can also pass Pairs of key => values to the Dict constructor:

name2number_map = Dict("one" => 1, "two" => 2)
Dict{String, Int64} with 2 entries:
  "two" => 2
  "one" => 1

You can retrieve a Dict’s value by indexing it by the corresponding key:



To add a new entry, you index the Dict by the desired key and assign a value with the assignment = operator:

name2number_map["three"] = 3


If you want to check if a Dict has a certain key you can use keys and in:

"two" in keys(name2number_map)


To delete a key you can use either the delete! function:

delete!(name2number_map, "three")
Dict{String, Int64} with 2 entries:
  "two" => 2
  "one" => 1

Or, to delete a key while returning its value, you can use pop!:

popped_value = pop!(name2number_map, "two")


Now, our name2number_map has only one key:

Dict{String, Int64} with 1 entry:
  "one" => 1

Dicts are also used for data manipulation by DataFrames.jl (Section 4) and for data visualization by Makie.jl (Section 6). So, it is important to know their basic functionality.

There is another useful way of constructing Dicts. Suppose that you have two vectors and you want to construct a Dict with one of them as keys and the other as values. You can do that with the zip function which “glues” together two objects (just like a zipper):

A = ["one", "two", "three"]
B = [1, 2, 3]

name2number_map = Dict(zip(A, B))
Dict{String, Int64} with 3 entries:
  "two" => 2
  "one" => 1
  "three" => 3

For instance, we can now get the number 3 via:



3.3.10 Symbol

Symbol is actually not a data structure. It is a type and behaves a lot like a string. Instead of surrounding the text by quotation marks, a symbol starts with a colon (:) and can contain underscores:

sym = :some_text

We can easily convert a symbol to string and vice versa:

s = string(sym)

sym = Symbol(s)

One simple benefit of symbols is that you have to type one character less, that is, :some_text versus "some text". We use Symbols a lot in data manipulations with the DataFrames.jl package (Section 4) and data visualizations with the Makie.jl package (Section 6).

3.3.11 Splat Operator

In Julia we have the “splat” operator ... which is used in function calls as a sequence of arguments. We will occasionally use splatting in some function calls in the data manipulation and data visualization chapters.

The most intuitive way to learn about splatting is with an example. The add_elements function below takes three arguments to be added together:

add_elements(a, b, c) = a + b + c
add_elements (generic function with 1 method)

Now, suppose that we have a collection with three elements. The naïve way to this would be to supply the function with all three elements as function arguments like this:

my_collection = [1, 2, 3]

add_elements(my_collection[1], my_collection[2], my_collection[3])


Here is where we use the “splat” operator ... which takes a collection (often an array, vector, tuple, or range) and converts it into a sequence of arguments:



The ... is included after the collection that we want to “splat” into a sequence of arguments. In the example above, the following are the same:

add_elements(my_collection...) == add_elements(my_collection[1], my_collection[2], my_collection[3])


Anytime Julia sees a splatting operator inside a function call, it will be converted on a sequence of arguments for all elements of the collection separated by commas.

It also works for ranges:



Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso