3.3 Filesystem

In data science, most projects are undertaken in a collaborative effort. We share code, data, tables, figures and so on. Behind everything, there is the operational system (OS) filesystem. In an ideal work, code would run the same in different OS. But that is not what actually happens. One instance of this is the difference of Windows paths, such as C:\\user\john\, and Linux paths, such as /home/john. This is why is important to discuss filesystem best practices.

Julia has native filesystem capabilities that can handle all different OS demands. They are located in the Filesystem module from the core Base Julia library. This means that Julia provides everything you need to make your code perform flawlessly in any OS that you want to.

Whenever you are dealing with files such as CSV, Excel files or other Julia scripts, make sure that your code is compliant with all different OS filesystems. This is easily accomplished with the joinpath and pwd functions.

The pwd function is an acronym for print working directory and it returns a string containing the current working directory. One nice thing about pwd is that it is robust to OS, i.e. it will return the correct string in Linux, MacOS, Windows or any other OS. For example, let’s see what are our current directory and record it in a variable root:

root = pwd()
/home/runner/work/JuliaDataScience/JuliaDataScience

The next step would be to include the relative path from root to our desired file. Since different OS have different ways to construct relative paths with subfolders, some use forward slash / while other might use backslashes \, we cannot simply concatenate the our file’s relative path with the root string. For that, we have the joinpath function, which will join different relative paths and filenames into your specific OS filesystem implementation.

Suppose you have a script named my_script.jl inside your project’s directory. You can have a robust representation of the filepath to my_script.jl as:

joinpath(root, "my_script.jl")
/home/runner/work/JuliaDataScience/JuliaDataScience/my_script.jl

joinpath also handles subfolders. Let’s now imagine a common situation where you have a folder named data/ in your project’s directory. Inside this folder there is a CSV file named my_data.csv. You can have the same robust representation of the filepath to my_data.csv as:

joinpath(root, "data", "my_data.csv")
/home/runner/work/JuliaDataScience/JuliaDataScience/data/my_data.csv

Always make sure that your code can run anywhere. It’s a good habit to pick up, because it’s very likely to save problems later.



CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer and Lazaro Alonso