3.4 Julia Standard Library

Julia has a rich standard library that ships with every Julia installation. Contrary to everything that we have seen so far, e.g. types, data structures and filesystem; you must import standard library modules into your environment to use a particular module or function.

This is done with the using keyword:

using ModuleName

Now you can access all functions and types inside ModuleName.

3.4.1 Dates

How to handle dates and timestamps is something quite important in data science. Like we said in Why Julia? (Section 2) section, Python’s pandas uses its own Datetime type to handle dates. The same with R tidyverse’s lubridate package, which also defines its own datetime type to handle dates. Julia doesn’t need any of this, it has all the date stuff already baked onto its standard library, in a module named Dates.

To begin, let’s import the Dates module:

using Dates

3.4.1.1 Date and DateTime Types

The Dates standard library module has two types for working with dates:

  1. Date: representing time in days; and
  2. DateTime: representing time in millisecond precision.

We can construct Date and DateTime with the default constructor either by specifying an integer to represent year, month, day, hours and so on:

Date(1987) # year
1987-01-01
Date(1987, 9) # month
1987-09-01
Date(1987, 9, 13) # day
1987-09-13
DateTime(1987, 9, 13, 21) # hour
1987-09-13T21:00:00
DateTime(1987, 9, 13, 21, 21) # minute
1987-09-13T21:21:00

For the curious, September 13th 1987, 21:21 is the official time of birth of the first author, Jose.

We can also pass Period types to the default constructor. Period types are the human-equivalent representation of time for the computer. Julia’s Dates have the following Period abstract subtypes:

subtypes(Period)
DatePeriod
TimePeriod

Which divide into the following concrete types, they are pretty much self-explanatory:

subtypes(DatePeriod)
Day
Month
Quarter
Week
Year
subtypes(TimePeriod)
Hour
Microsecond
Millisecond
Minute
Nanosecond
Second

So we could alternatively construct Jose’s official time of birth as:

DateTime(Year(1987), Month(9), Day(13), Hour(21), Minute(21))
1987-09-13T21:21:00

3.4.1.2 Parsing Dates

Most of the time we won’t be constructing Date or DateTime instances from scratch. Actually, we probably will be parsing strings as Date or DateTime types.

The Date and DateTime constructors can be fed a string and a format string. For example, the string "19870913" representing September 13th 1987 can be parsed with:

Date("19870913", "yyyymmdd")
1987-09-13

Notice that the second argument is a string representation of the format. We have the first four digits representing year y, followed by two digits for month m and finally two digits for day d.

It also works for timestamps with DateTime:

DateTime("1987-09-13T21:21:00", "yyyy-mm-ddTHH:MM:SS")
1987-09-13T21:21:00

You can find more on how to specify different format as strings in the Julia Dates’ documentation. Don’t worry if you have to revisit it all the time, we ourselves have to do it all the time when working with dates and timestamps.

According to Julia Dates’ documentation, using the Date(date_string, format_string) method is fine if only called a few times. If there are many similarly formatted date strings to parse, however, it is much more efficient to first create a DateFormat type, and then pass it instead of a raw format string. So our previous example would become:

format = DateFormat("yyyymmdd")
Date("19870913", format)
1987-09-13

Alternatively, without loss of performance, you can use the string literal prefix dateformat"...":

Date("19870913", dateformat"yyyymmdd")
1987-09-13

3.4.1.3 Extracting Date Information

It is easy to extract desired information from Date and DateTime objects. First, let’s create an instance of a very special date:

my_birthday = Date("1987-09-13")
1987-09-13

We can extract anything we want from my_birthday:

year(my_birthday)
1987
month(my_birthday)
9
day(my_birthday)
13

Julia’s Dates module also have compound functions that returns a tuple of values:

yearmonth(my_birthday)
(1987, 9)
monthday(my_birthday)
(9, 13)
yearmonthday(my_birthday)
(1987, 9, 13)

We can also see day of the week and other handy stuff:

dayofweek(my_birthday)
7
dayname(my_birthday)
Sunday
dayofweekofmonth(my_birthday) # second sunday
2

Yep, Jose was born on the second sunday of September.

NOTE: Here’s a handy tip to just recover weekdays from Dates instances. Just use a filter on dayofweek(your_date) <= 5. For business day you can check the package BusinessDays.jl.

3.4.1.4 Date Operations

We can perform operations in Dates instances. For example, we can add days to a Date or DateTime instance. Notice that Julia’s Dates will automatically perform the adjustments necessary for leapyears, of months with 30 or 31 days (this is known as calendrical arithmetic).

my_birthday + Day(90)
1987-12-12

We can add as many as we like:

my_birthday + Day(90) + Month(2) + Year(1)
1989-02-11

To get date duration, we just use the subtraction - operator. Let’s see how many days Jose is old:

today() - my_birthday
12428 days

The default duration of Date types is a Day instance. For the DateTime, the default duration is Millisecond instance:

DateTime(today()) - DateTime(my_birthday)
1073779200000 milliseconds

3.4.1.5 Date Intervals

One nice thing about Dates module is that we can also easily construct date and time intervals. Julia is clever enough to not have to define the whole interval types and operations that we covered in Section 3.2.5. It just extends the functions and operations defined for UnitRange to Date’s types. This is known as multiple dispatch and we already covered in Why Julia?(Section 2).

For example suppose you want to create a Day interval. This is easy done with the colon : operator:

Date("2021-01-01"):Day(1):Date("2021-01-07")
2021-01-01
2021-01-02
2021-01-03
2021-01-04
2021-01-05
2021-01-06
2021-01-07

There is nothing special in using Day(1) as interval, we can use whatever Period type as interval. For example using 3 days as intervals:

Date("2021-01-01"):Day(3):Date("2021-01-07")
2021-01-01
2021-01-04
2021-01-07

Or even months:

Date("2021-01-01"):Month(1):Date("2021-03-01")
2021-01-01
2021-02-01
2021-03-01

Note that the type of this interval is a StepRange with the Date and concrete Period type we used as interval inside the colon : operator:

my_date_interval = Date("2021-01-01"):Month(1):Date("2021-03-01")
typeof(my_date_interval)
StepRange{Date, Month}

We can convert this to a vector with the collect function:

my_date_interval_vector = collect(my_date_interval)
2021-01-01
2021-02-01
2021-03-01

And have all the array functionalities available, like, for example, indexing:

my_date_interval_vector[end]
2021-03-01

We can also broadcast date operations to our vector of Dates:

my_date_interval_vector .+ Day(10)
2021-01-11
2021-02-11
2021-03-11

All we’ve done with Date types can be extended to DateTime types in the same manner.

3.4.2 Random Numbers

Another important module in Julia’s standard library is the Random module. This module deals with random number generation. Random is a rich library and, if you interested in it, you should consult Julia’s Random documentation. We will cover only three functions: seed!, rand and randn.

To begin we first import the Random module:

using Random

We have two main functions that generate random numbers:

NOTE: Note that those two functions are already in the Julia Base module. So you don’t need to import Random if you planning to use them

3.4.2.1 rand

By default if you call rand without arguments it will return a Float64 in the interval \([0, 1)\), which means between 0 inclusive to 1 exclusive:

rand()
0.6888824123218602

You can modify rand arguments in several ways. For example, suppose you want more than 1 random number:

rand(3)
[0.9354721547776108, 0.0717121876541984, 0.22350647843540172]

Or you want a different interval:

rand(1.0:10.0)
1.0

You can also specify a different step size inside the interval and a different type. Here we are using number without the . so Julia will interpret them as Int64:

rand(2:2:20)
8

You can also mix and match arguments:

rand(2:2:20, 3)
[10, 16, 8]

It also supports a collection of elements as a tuple:

rand((42, "Julia", 3.14))
3.14

And also arrays:

rand([1, 2, 3])
2

Dicts:

rand(Dict("one"=>1, "two"=>2))
"two" => 2

To finish off all the rand arguments options, you can specify the desired random number dimensions in a tuple. If you do this, the returned type will be an array. For example, a 2x2 matrix of Float64 between 1.0 and 3.0:

rand(1.0:3.0, (2, 2))
2×2 Matrix{Float64}:
 3.0  3.0
 2.0  2.0

3.4.2.2 randn

randn follows the same general principle from rand but now it only returns numbers generated from standard normal distribution. The standard normal distribution is the normal distribution with mean 0 and standard deviation 1. The default type is Float64 and it only allows for subtypes of AbstractFloat or Complex:

randn()
-0.13156649838701556

We can only specify the size:

randn((2, 2))
2×2 Matrix{Float64}:
 1.00204    0.210124
 0.353439  -0.00234296

3.4.2.3 seed!

To finish off the Random overview, let’s talk about reproducibility. Often, we want to make something replicable. Meaning that, we want the random number generator to generate the same random sequence of numbers, despite paradoxical that might sound… We can do so with the seed! function.

Let me show you an example of a rand that generates the same three numbers given a certain seed:

Random.seed!(123)
rand(3)
[0.7684476751965699, 0.940515000715187, 0.6739586945680673]
Random.seed!(123)
rand(3)
[0.7684476751965699, 0.940515000715187, 0.6739586945680673]

Note that seed! is not automatically exported by the Random module. We have to call it with the Module.function syntax.

In order to avoid tedious and inefficient repetition of seed! all over the place we can instead define an instance of a seed! and pass it as a first argument of either rand or randn.

my_seed = Random.seed!(123)
MersenneTwister(123)
rand(my_seed, 3)
[0.3954531123351086, 0.3132439558075186, 0.6625548164736534]
rand(my_seed, 3)
[0.3954531123351086, 0.3132439558075186, 0.6625548164736534]

NOTE: If you want your code to be reproducible you can just call Random.seed! in the beggining of your script. This will take care of reproducibility in sequential Random operations. No need to use it all rand and randn usage.

3.4.3 Downloads

One last thing from Julia’s standard library for us to cover is the Download module. It will be really brief because we will only be covering a single function named download.

Suppose you want to download a file from the internet to your local storage. You can accomplish this with the download function. The first and only required argument is the file’s url. You can also specify as a second argument the desired output path for the downloaded file (don’t forget the filesystem best practices!). If you don’t specify a second argument, Julia will, by default, create a temporary file with the tempfile function.

Let’s import the Download module:

using Download

For example let’s download our JuliaDataScience GitHub repository Project.toml file. Note that download function is not exported by Downloads module, so we have to use the Module.function syntax. By default it returns a string that holds the file path for the downloaded file:

url = "https://raw.githubusercontent.com/JuliaDataScience/JuliaDataScience/main/Project.toml"

my_file = Downloads.download(url) # tempfile() being created
/tmp/jl_5eREGu

Let’s just show the first 4 lines of our downloaded file with the readlines function:

readlines(my_file)[1:4]
4-element Vector{String}:
 "name = \"JDS\""
 "uuid = \"6c596d62-2771-44f8-8373-3ec4b616ee9d\""
 "authors = [\"Jose Storopoli\", \"Rik Huijzer\", \"Lazaro Alonso\"]"
 "version = \"0.1.0\""

NOTE: If you want to interact with web requests or web APIs, you would probably need to use the HTTP.jl package.



CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer and Lazaro Alonso