3.4 Filesystem

In data science, most projects are undertaken in a collaborative effort. We share code, data, tables, figures and so on. Behind everything, there is the operating system (OS) filesystem. In a perfect world, the same program would give the same output when running on different operating systems. Unfortunately, that is not always the case. One instance of this is the difference between Windows paths, such as C:\\user\john, and Linux paths, such as /home/john. This is why it is important to discuss filesystem best practices.

Julia has native filesystem capabilities that handle the differences between operating systems. They are located in the Filesystem module from the core Base Julia library.

Whenever you are dealing with files such as CSV, Excel files or other Julia scripts, make sure that your code works on different OS filesystems. This is easily accomplished with the joinpath, @__FILE__ and pkgdir functions.

If you write your code in a package, you can use pkgdir to get the root directory of the package. For example, for the Julia Data Science (JDS) package that we use to produce this book, the root directory is:

/home/runner/work/JuliaDataScience/JuliaDataScience

As you can see, the code to produce this book was running on a Linux computer. If you’re using a script, you can get the location of the script file via

root = dirname(@__FILE__)

The nice thing about these two commands is that they are independent of how the user started Julia. In other words, it doesn’t matter whether the user started the program with julia scripts/script.jl or julia script.jl, in both cases the paths are the same.

The next step would be to include the relative path from root to our desired file. Since different OS have different ways to construct relative paths with subfolders (some use forward slashes / while other might use backslashes \), we cannot simply concatenate the file’s relative path with the root string. For that, we have the joinpath function, which will join different relative paths and filenames according to your specific OS filesystem implementation.

Suppose that you have a script named my_script.jl inside your project’s directory. You can have a robust representation of the filepath to my_script.jl as:

joinpath(root, "my_script.jl")
/home/runner/work/JuliaDataScience/JuliaDataScience/my_script.jl

joinpath also handles subfolders. Let’s now imagine a common situation where you have a folder named data/ in your project’s directory. Inside this folder there is a CSV file named my_data.csv. You can have the same robust representation of the filepath to my_data.csv as:

joinpath(root, "data", "my_data.csv")
/home/runner/work/JuliaDataScience/JuliaDataScience/data/my_data.csv

It’s a good habit to pick up, because it’s very likely to save problems for you or other people later.



CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso