Julia has several native data structures. They are abstractions of data that represent some form of structured data. We will cover the most used ones. They hold homogeneous or heterogeneous data. Since they are collections, they can be looped over with the for
loops.
We will cover String
, Tuple
, NamedTuple
, UnitRange
, Arrays
, Pair
, Dict
, Symbol
.
When you stumble across a data structure in Julia, you can find methods that accept it as an argument with the methodswith
function. In Julia, the distinction between methods and functions is as follows. Every function can have multiple methods like we have shown earlier. The methodswith
function is nice to have in your bag of tricks. Let’s see what we can do with a String
for example:
first(methodswith(String), 5)
[1] String(s::String) @ Core boot.jl:375
[2] Symbol(s::String) @ Core boot.jl:513
[3] ==(y::PosLenString, x::String) @ WeakRefStrings ~/.julia/packages/WeakRefStrings/31nkb/src/poslenstrings.jl:84
[4] ==(x::String, y::PosLenString) @ WeakRefStrings ~/.julia/packages/WeakRefStrings/31nkb/src/poslenstrings.jl:72
[5] ==(a::String, b::String) @ Base strings/string.jl:143
Before we dive into data structures, we need to talk about broadcasting (also known as vectorization) and the “dot” operator .
.
We can broadcast mathematical operations like *
(multiplication) or +
(addition) using the dot operator. For example, broadcasted addition would imply a change from +
to .+
:
[1, 2, 3] .+ 1
[2, 3, 4]
It also works automatically with functions. (Technically, the mathematical operations, or infix operators, are also functions, but that is not so important to know.) Remember our logarithm
function?
logarithm.([1, 2, 3])
[0.0, 0.6931471805599569, 1.0986122886681282]
!
It is a Julia convention to append a bang !
to names of functions that modify one or more of their arguments. This convention warns the user that the function is not pure, i.e., that it has side effects. A function with side effects is useful when you want to update a large data structure or variable container without having all the overhead from creating a new instance.
For example, we can create a function that adds 1 to each element in a vector V
:
function add_one!(V)
for i in eachindex(V)
V[i] += 1
end
return nothing
end
my_data = [1, 2, 3]
add_one!(my_data)
my_data
[2, 3, 4]
Strings are represented delimited by double quotes:
typeof("This is a string")
String
We can also write a multiline string:
text = "
This is a big multiline string.
As you can see.
It is still a String to Julia.
"
This is a big multiline string.
As you can see.
It is still a String to Julia.
But it is usually clearer to use triple quotation marks:
s = """
This is a big multiline string with a nested "quotation".
As you can see.
It is still a String to Julia.
"""
This is a big multiline string with a nested "quotation".
As you can see.
It is still a String to Julia.
When using triple-backticks, the indentation and newline at the start is ignored by Julia. This improves code readability because you can indent the block in your source code without those spaces ending up in your string.
A common string operation is string concatenation. Suppose that you want to construct a new string that is the concatenation of two or more strings. This is accomplished in Julia either with the *
operator or the join
function. This symbol might sound like a weird choice and it actually is. For now, many Julia codebases are using this symbol, so it will stay in the language. If you’re interested, you can read a discussion from 2015 about it at https://github.com/JuliaLang/julia/issues/11030.
hello = "Hello"
goodbye = "Goodbye"
hello * goodbye
HelloGoodbye
As you can see, we are missing a space between hello
and goodbye
. We could concatenate an additional " "
string with the *
, but that would be cumbersome for more than two strings. That’s where the join
function comes in handy. We just pass as arguments the strings inside the brackets []
and the separator:
join([hello, goodbye], " ")
Hello Goodbye
Concatenating strings can be convoluted. We can be much more expressive with string interpolation. It works like this: you specify whatever you want to be included in your string with the dollar sign $
. Here’s the example before but now using interpolation:
"$hello $goodbye"
Hello Goodbye
It even works inside functions. Let’s revisit our test
function from Section 3.2.5:
function test_interpolated(a, b)
if a < b
"$a is less than $b"
elseif a > b
"$a is greater than $b"
else
"$a is equal to $b"
end
end
test_interpolated(3.14, 3.14)
3.14 is equal to 3.14
There are several functions to manipulate strings in Julia. We will demonstrate the most common ones. Also, note that most of these functions accept a Regular Expression (regex) as arguments. We won’t cover Regular Expressions in this book, but you are encouraged to learn about them, especially if most of your work uses textual data.
First, let us define a string for us to play around with:
julia_string = "Julia is an amazing open source programming language"
Julia is an amazing open source programming language
contains
, startswith
and endswith
: A conditional (returns either true
or false
) if the second argument is a:
substring of the first argument
contains(julia_string, "Julia")
true
prefix of the first argument
startswith(julia_string, "Julia")
true
suffix of the first argument
endswith(julia_string, "Julia")
false
lowercase
, uppercase
, titlecase
and lowercasefirst
:
lowercase(julia_string)
julia is an amazing open source programming language
uppercase(julia_string)
JULIA IS AN AMAZING OPEN SOURCE PROGRAMMING LANGUAGE
titlecase(julia_string)
Julia Is An Amazing Open Source Programming Language
lowercasefirst(julia_string)
julia is an amazing open source programming language
replace
: introduces a new syntax, called the Pair
replace(julia_string, "amazing" => "awesome")
Julia is an awesome open source programming language
split
: breaks up a string by a delimiter:
split(julia_string, " ")
SubString{String}["Julia", "is", "an", "amazing", "open", "source", "programming", "language"]
Often, we need to convert between types in Julia. To convert a number to a string we can use the string
function:
my_number = 123
typeof(string(my_number))
String
Sometimes, we want the opposite: convert a string to a number. Julia has a handy function for that: parse
.
typeof(parse(Int64, "123"))
Int64
Sometimes, we want to play safe with these conversions. That’s when tryparse
function steps in. It has the same functionality as parse
but returns either a value of the requested type, or nothing
. That makes tryparse
handy when we want to avoid errors. Of course, you would need to deal with all those nothing
values afterwards.
tryparse(Int64, "A very non-numeric string")
nothing
Julia has a data structure called tuple. They are really special in Julia because they are often used in relation to functions. Since functions are an important feature in Julia, every Julia user should know the basics of tuples.
A tuple is a fixed-length container that can hold multiple different types. A tuple is an immutable object, meaning that it cannot be modified after instantiation. To construct a tuple, use parentheses ()
to delimit the beginning and end, along with commas ,
as delimiters between values:
my_tuple = (1, 3.14, "Julia")
(1, 3.14, "Julia")
Here, we are creating a tuple with three values. Each one of the values is a different type. We can access them via indexing. Like this:
my_tuple[2]
3.14
We can also loop over tuples with the for
keyword. And even apply functions to tuples. But we can never change any value of a tuple since they are immutable.
Remember functions that return multiple values back in Section 3.2.4.2? Let’s inspect what our add_multiply
function returns:
return_multiple = add_multiply(1, 2)
typeof(return_multiple)
Tuple{Int64, Int64}
This is because return a, b
is the same as return (a, b)
:
1, 2
(1, 2)
So, now you can see why they are often related.
One more thing about tuples. When you want to pass more than one variable to an anonymous function, guess what you would need to use? Once again: tuples!
map((x, y) -> x^y, 2, 3)
8
Or, even more than two arguments:
map((x, y, z) -> x^y + z, 2, 3, 1)
9
Sometimes, you want to name the values in tuples. That’s when named tuples comes in. Their functionality is pretty much same as tuples: they are immutable and can hold any type of value.
The construction of named tuples is slightly different from that of tuples. You have the familiar parentheses ()
and the comma ,
value separator. But now you name the values:
my_namedtuple = (i=1, f=3.14, s="Julia")
(i = 1, f = 3.14, s = "Julia")
We can access named tuple’s values via indexing like regular tuples or, alternatively, access by their names with the .
:
my_namedtuple.s
Julia
To finish our discussion of named tuples, there is one important quick syntax that you’ll see a lot in Julia code. Often Julia users create a named tuple by using the familiar parenthesis ()
and commas ,
, but without naming the values. To do so you begin the named tuple construction by specifying first a semicolon ;
before the values. This is especially useful when the values that would compose the named tuple are already defined in variables or when you want to avoid long lines:
i = 1
f = 3.14
s = "Julia"
my_quick_namedtuple = (; i, f, s)
(i = 1, f = 3.14, s = "Julia")
A range in Julia represents an interval between start and stop boundaries. The syntax is start:stop
:
1:10
1:10
As you can see, our instantiated range is of type UnitRange{T}
where T
is the type inside the UnitRange
:
typeof(1:10)
UnitRange{Int64}
And, if we gather all the values, we get:
[x for x in 1:10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
We can also construct ranges for other types:
typeof(1.0:10.0)
StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}
Sometimes, we want to change the default interval step size behavior. We can do that by adding a step size in the range syntax start:step:stop
. For example, suppose we want a range of Float64
from 0 to 1 with steps of size 0.2:
0.0:0.2:1.0
0.0:0.2:1.0
If you want to “materialize” a range into a collection, you can use the function collect
:
collect(1:10)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
We have an array of the type specified in the range between the boundaries that we’ve set. Speaking of arrays, let’s talk about them.
In its most basic form, arrays hold multiple objects. For example, they can hold multiple numbers in one-dimension:
myarray = [1, 2, 3]
[1, 2, 3]
Most of the time you would want arrays of a single type for performance issues, but note that they can also hold objects of different types:
myarray = ["text", 1, :symbol]
Any["text", 1, :symbol]
They are the “bread and butter” of data scientist, because arrays are what underlies most of data manipulation and data visualization workflows.
Therefore, Arrays are an essential data structure.
Let’s start with array types. There are several, but we will focus on the two most used in data science:
Vector{T}
: one-dimensional array. Alias for Array{T, 1}
.Matrix{T}
: two-dimensional array. Alias for Array{T, 2}
.Note here that T
is the type of the underlying array. So, for example, Vector{Int64}
is a Vector
in which all elements are Int64
s, and Matrix{AbstractFloat}
is a Matrix
in which all elements are subtypes of AbstractFloat
.
Most of the time, especially when dealing with tabular data, we are using either one- or two-dimensional arrays. They are both Array
types for Julia. But, we can use the handy aliases Vector
and Matrix
for clear and concise syntax.
How do we construct an array? In this section, we start by constructing arrays in a low-level way. This can be necessary to write high performing code in some situations. However, in most situations, this is not necessary, and we can safely use more convenient methods to create arrays. These more convenient methods will be described later in this section.
The low-level constructor for Julia arrays is the default constructor. It accepts the element type as the type parameter inside the {}
brackets and inside the constructor you’ll pass the element type followed by the desired dimensions. It is common to initialize vector and matrices with undefined elements by using the undef
argument for type. A vector of 10 undef
Float64
elements can be constructed as:
my_vector = Vector{Float64}(undef, 10)
[6.94165739404193e-310, 6.9416581732965e-310, 6.941769171624e-310, 6.9417615649581e-310, 6.941769171624e-310, 6.941769171624e-310, 5.0e-324, NaN, 5.0e-324, 6.94165876905873e-310]
For matrices, since we are dealing with two-dimensional objects, we need to pass two dimension arguments inside the constructor: one for rows and another for columns. For example, a matrix with 10 rows and 2 columns of undef
elements can be instantiated as:
my_matrix = Matrix{Float64}(undef, 10, 2)
10×2 Matrix{Float64}:
2.079e-320 1.46314e-306
0.0 0.0
3.97445e233 3.19008e231
6.08419e-310 1.01185e-319
4.08312e233 6.21074e231
0.0 1.42961e-306
5.53e-322 6.10795e-310
1.72723e-77 6.63138e-315
1.46299e-306 2.37917e77
2.37664e-312 0.0
We also have some syntax aliases for the most common elements in array construction:
zeros
for all elements being initialized to zero. Note that the default type is Float64
which can be changed if necessary:
my_vector_zeros = zeros(10)
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
my_matrix_zeros = zeros(Int64, 10, 2)
10×2 Matrix{Int64}:
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
ones
for all elements being initialized to one:
my_vector_ones = ones(Int64, 10)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
my_matrix_ones = ones(10, 2)
10×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
1.0 1.0
For other elements, we can first instantiate an array with undef
elements and use the fill!
function to fill all elements of an array with the desired element. Here’s an example with 3.14
(\(\pi\)):
my_matrix_π = Matrix{Float64}(undef, 2, 2)
fill!(my_matrix_π, 3.14)
2×2 Matrix{Float64}:
3.14 3.14
3.14 3.14
We can also create arrays with array literals. For example, here’s a 2x2 matrix of integers:
[[1 2]
[3 4]]
2×2 Matrix{Int64}:
1 2
3 4
Array literals also accept a type specification before the []
brackets. So, if we want the same 2x2 array as before but now as floats, we can do so:
Float64[[1 2]
[3 4]]
2×2 Matrix{Float64}:
1.0 2.0
3.0 4.0
It also works for vectors:
Bool[0, 1, 0, 1]
Bool[0, 1, 0, 1]
You can even mix and match array literals with the constructors:
[ones(Int, 2, 2) zeros(Int, 2, 2)]
2×4 Matrix{Int64}:
1 1 0 0
1 1 0 0
[zeros(Int, 2, 2)
ones(Int, 2, 2)]
4×2 Matrix{Int64}:
0 0
0 0
1 1
1 1
[ones(Int, 2, 2) [1; 2]
[3 4] 5]
3×3 Matrix{Int64}:
1 1 1
1 1 2
3 4 5
Another powerful way to create an array is to write an array comprehension. This way of creating arrays is better in most cases: it avoids loops, indexing, and other error-prone operations. You specify what you want to do inside the []
brackets. For example, say we want to create a vector of squares from 1 to 10:
[x^2 for x in 1:10]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
They also support multiple inputs:
[x*y for x in 1:10 for y in 1:2]
[1, 2, 2, 4, 3, 6, 4, 8, 5, 10, 6, 12, 7, 14, 8, 16, 9, 18, 10, 20]
And conditionals:
[x^2 for x in 1:10 if isodd(x)]
[1, 9, 25, 49, 81]
As with array literals, you can specify your desired type before the []
brackets:
Float64[x^2 for x in 1:10 if isodd(x)]
[1.0, 9.0, 25.0, 49.0, 81.0]
Finally, we can also create arrays with concatenation functions. Concatenation is a standard term in computer programming and means “to chain together”. For example, we can concatenate strings with “aa” and “bb” to get “aabb”:
"aa" * "bb"
aabb
And, we can concatenate arrays to create new arrays:
cat
: concatenate input arrays along a specific dimension dims
cat(ones(2), zeros(2), dims=1)
[1.0, 1.0, 0.0, 0.0]
cat(ones(2), zeros(2), dims=2)
2×2 Matrix{Float64}:
1.0 0.0
1.0 0.0
vcat
: vertical concatenation, a shorthand for cat(...; dims=1)
vcat(ones(2), zeros(2))
[1.0, 1.0, 0.0, 0.0]
hcat
: horizontal concatenation, a shorthand for cat(...; dims=2)
hcat(ones(2), zeros(2))
2×2 Matrix{Float64}:
1.0 0.0
1.0 0.0
Once we have arrays, the next logical step is to inspect them. There are a lot of handy functions that allow the user to have an insight into any array.
It is most useful to know what type of elements are inside an array. We can do this with eltype
:
eltype(my_matrix_π)
Float64
After knowing its types, one might be interested in array dimensions. Julia has several functions to inspect array dimensions:
length
: total number of elements
length(my_matrix_π)
4
ndims
: number of dimensions
ndims(my_matrix_π)
2
size
: this one is a little tricky. By default it will return a tuple containing the array’s dimensions.
size(my_matrix_π)
(2, 2)
You can get a specific dimension with a second argument to size
. Here, the the second axis is columns
size(my_matrix_π, 2)
2
Sometimes, we want to inspect only certain parts of an array. This is called indexing and slicing. If you want a particular observation of a vector, or a row or column of a matrix, you’ll probably need to index an array.
First, we will create an example vector and matrix to play around:
my_example_vector = [1, 2, 3, 4, 5]
my_example_matrix = [[1 2 3]
[4 5 6]
[7 8 9]]
Let’s start with vectors. Suppose that you want the second element of a vector. You append []
brackets with the desired index inside:
my_example_vector[2]
2
The same syntax follows with matrices. But, since matrices are 2-dimensional arrays, we have to specify both rows and columns. Let’s retrieve the element from the second row (first dimension) and first column (second dimension):
my_example_matrix[2, 1]
4
Julia also has conventional keywords for the first and last elements of an array: begin
and end
. For example, the second to last element of a vector can be retrieved as:
my_example_vector[end-1]
4
This also works for matrices. Let’s retrieve the element of the last row and second column:
my_example_matrix[end, begin+1]
8
Often, we are not only interested in just one array element, but in a whole subset of array elements. We can accomplish this by slicing an array. It uses the same index syntax, but with the added colon :
to denote the boundaries that we are slicing through the array. For example, suppose we want to get the 2nd to 4th element of a vector:
my_example_vector[2:4]
[2, 3, 4]
We could do the same with matrices. Particularly with matrices if we want to select all elements in a following dimension we can do so with just a colon :
. For example, to get all the elements in the second row:
my_example_matrix[2, :]
[4, 5, 6]
You can interpret this with something like “take the 2nd row and all the columns”.
It also supports begin
and end
:
my_example_matrix[begin+1:end, end]
[6, 9]
There are several ways we could manipulate an array. The first would be to manipulate a singular element of the array. We just index the array by the desired element and proceed with an assignment =
:
my_example_matrix[2, 2] = 42
my_example_matrix
3×3 Matrix{Int64}:
1 2 3
4 42 6
7 8 9
Or, you can manipulate a certain subset of elements of the array. In this case, we need to slice the array and then assign with =
:
my_example_matrix[3, :] = [17, 16, 15]
my_example_matrix
3×3 Matrix{Int64}:
1 2 3
4 42 6
17 16 15
Note that we had to assign a vector because our sliced array is of type Vector
:
typeof(my_example_matrix[3, :])
Vector{Int64} (alias for Array{Int64, 1})
The second way we could manipulate an array is to alter its shape. Suppose that you have a 6-element vector and you want to make it a 3x2 matrix. You can do this with reshape
, by using the array as the first argument and a tuple of dimensions as the second argument:
six_vector = [1, 2, 3, 4, 5, 6]
three_two_matrix = reshape(six_vector, (3, 2))
three_two_matrix
3×2 Matrix{Int64}:
1 4
2 5
3 6
You can convert it back to a vector by specifying a tuple with only one dimension as the second argument:
reshape(three_two_matrix, (6, ))
[1, 2, 3, 4, 5, 6]
The third way we could manipulate an array is to apply a function over every array element. This is where the “dot” operator .
, also known as broadcasting, comes in.
logarithm.(my_example_matrix)
3×3 Matrix{Float64}:
0.0 0.693147 1.09861
1.38629 3.73767 1.79176
2.83321 2.77259 2.70805
The dot operator in Julia is extremely versatile. You can even use it to broadcast infix operators:
my_example_matrix .+ 100
3×3 Matrix{Int64}:
101 102 103
104 142 106
117 116 115
An alternative to broadcasting a function over a vector is to use map
:
map(logarithm, my_example_matrix)
3×3 Matrix{Float64}:
0.0 0.693147 1.09861
1.38629 3.73767 1.79176
2.83321 2.77259 2.70805
For anonymous functions, map
is usually more readable. For example,
map(x -> 3x, my_example_matrix)
3×3 Matrix{Int64}:
3 6 9
12 126 18
51 48 45
is quite clear. However, the same broadcast looks as follows:
(x -> 3x).(my_example_matrix)
3×3 Matrix{Int64}:
3 6 9
12 126 18
51 48 45
Next, map
works with slicing:
map(x -> x + 100, my_example_matrix[:, 3])
[103, 106, 115]
Finally, sometimes, and specially when dealing with tabular data, we want to apply a function over all elements in a specific array dimension. This can be done with the mapslices
function. Similar to map
, the first argument is the function and the second argument is the array. The only change is that we need to specify the dims
argument to flag what dimension we want to transform the elements.
For example, let’s use mapslices
with the sum
function on both rows (dims=1
) and columns (dims=2
):
# rows
mapslices(sum, my_example_matrix; dims=1)
1×3 Matrix{Int64}:
22 60 24
# columns
mapslices(sum, my_example_matrix; dims=2)
3×1 Matrix{Int64}:
6
52
48
One common operation is to iterate over an array with a for
loop. The regular for
loop over an array returns each element.
The simplest example is with a vector.
simple_vector = [1, 2, 3]
empty_vector = Int64[]
for i in simple_vector
push!(empty_vector, i + 1)
end
empty_vector
[2, 3, 4]
Sometimes, you don’t want to loop over each element, but actually over each array index. We can use the eachindex
function combined with a for
loop to iterate over each array index.
Again, let’s show an example with a vector:
forty_twos = [42, 42, 42]
empty_vector = Int64[]
for i in eachindex(forty_twos)
push!(empty_vector, i)
end
empty_vector
[1, 2, 3]
In this example, the eachindex(forty_twos)
returns the indices of forty_twos
, namely [1, 2, 3]
.
Similarly, we can iterate over matrices. The standard for
loop goes first over columns then over rows. It will first traverse all elements in column 1, from the first row to the last row, then it will move to column 2 in a similar fashion until it has covered all columns.
For those familiar with other programming languages: Julia, like most scientific programming languages, is “column-major”. Column-major means that the elements in the column are stored next to each other in memory13. This also means that iterating over elements in a column is much quicker than over elements in a row.
Ok, let’s show this in an example:
column_major = [[1 3]
[2 4]]
row_major = [[1 2]
[3 4]]
If we loop over the vector stored in column-major order, then the output is sorted:
indexes = Int64[]
for i in column_major
push!(indexes, i)
end
indexes
[1, 2, 3, 4]
However, the output isn’t sorted when looping over the other matrix:
indexes = Int64[]
for i in row_major
push!(indexes, i)
end
indexes
[1, 3, 2, 4]
It is often better to use specialized functions for these loops:
eachcol
: iterates over an array column first
first(eachcol(column_major))
[1, 2]
eachrow
: iterates over an array row first
first(eachrow(column_major))
[1, 3]
Compared to the huge section on arrays, this section on pairs will be brief. Pair
is a data structure that holds two objects (which typically belong to each other). We construct a pair in Julia using the following syntax:
my_pair = "Julia" => 42
"Julia" => 42
The elements are stored in the fields first
and second
.
my_pair.first
Julia
my_pair.second
42
But, in most cases, it’s easier use first
and last
14:
first(my_pair)
Julia
last(my_pair)
42
Pairs will be used a lot in data manipulation and data visualization since both DataFrames.jl
(Section 4) or Makie.jl
(Section 6) take objects of type Pair
in their main functions. For example, with DataFrames.jl
we’re going to see that :a => :b
can be used to rename the column :a
to :b
.
If you understood what a Pair
is, then Dict
won’t be a problem. For all practical purposes, Dict
s are mappings from keys to values. By mapping, we mean that if you give a Dict
some key, then the Dict
can tell you which value belongs to that key. key
s and value
s can be of any type, but usually key
s are strings.
There are two ways to construct Dict
s in Julia. The first is by passing a vector of tuples as (key, value)
to the Dict
constructor:
name2number_map = Dict([("one", 1), ("two", 2)])
Dict{String, Int64} with 2 entries:
"two" => 2
"one" => 1
There is a more readable syntax based on the Pair
type described above. You can also pass Pair
s of key => value
s to the Dict
constructor:
name2number_map = Dict("one" => 1, "two" => 2)
Dict{String, Int64} with 2 entries:
"two" => 2
"one" => 1
You can retrieve a Dict
’s value
by indexing it by the corresponding key
:
name2number_map["one"]
1
To add a new entry, you index the Dict
by the desired key
and assign a value
with the assignment =
operator:
name2number_map["three"] = 3
3
If you want to check if a Dict
has a certain key
you can use keys
and in
:
"two" in keys(name2number_map)
true
To delete a key
you can use either the delete!
function:
delete!(name2number_map, "three")
Dict{String, Int64} with 2 entries:
"two" => 2
"one" => 1
Or, to delete a key while returning its value, you can use pop!
:
popped_value = pop!(name2number_map, "two")
2
Now, our name2number_map
has only one key
:
name2number_map
Dict{String, Int64} with 1 entry:
"one" => 1
Dict
s are also used for data manipulation by DataFrames.jl
(Section 4) and for data visualization by Makie.jl
(Section 6). So, it is important to know their basic functionality.
There is another useful way of constructing Dict
s. Suppose that you have two vectors and you want to construct a Dict
with one of them as key
s and the other as value
s. You can do that with the zip
function which “glues” together two objects (just like a zipper):
A = ["one", "two", "three"]
B = [1, 2, 3]
name2number_map = Dict(zip(A, B))
Dict{String, Int64} with 3 entries:
"two" => 2
"one" => 1
"three" => 3
For instance, we can now get the number 3 via:
name2number_map["three"]
3
Symbol
is actually not a data structure. It is a type and behaves a lot like a string. Instead of surrounding the text by quotation marks, a symbol starts with a colon (:) and can contain underscores:
sym = :some_text
:some_text
We can easily convert a symbol to string and vice versa:
s = string(sym)
some_text
sym = Symbol(s)
:some_text
One simple benefit of symbols is that you have to type one character less, that is, :some_text
versus "some text"
. We use Symbol
s a lot in data manipulations with the DataFrames.jl
package (Section 4) and data visualizations with the Makie.jl
package (Section 6).
In Julia we have the “splat” operator ...
which is used in function calls as a sequence of arguments. We will occasionally use splatting in some function calls in the data manipulation and data visualization chapters.
The most intuitive way to learn about splatting is with an example. The add_elements
function below takes three arguments to be added together:
add_elements(a, b, c) = a + b + c
add_elements (generic function with 1 method)
Now, suppose that we have a collection with three elements. The naïve way to this would be to supply the function with all three elements as function arguments like this:
my_collection = [1, 2, 3]
add_elements(my_collection[1], my_collection[2], my_collection[3])
6
Here is where we use the “splat” operator ...
which takes a collection (often an array, vector, tuple, or range) and converts it into a sequence of arguments:
add_elements(my_collection...)
6
The ...
is included after the collection that we want to “splat” into a sequence of arguments. In the example above, the following are the same:
add_elements(my_collection...) == add_elements(my_collection[1], my_collection[2], my_collection[3])
true
Anytime Julia sees a splatting operator inside a function call, it will be converted on a sequence of arguments for all elements of the collection separated by commas.
It also works for ranges:
add_elements(1:3...)
6