1.1 What is Data Science?

Data science is not only machine learning and statistics, and it’s not all about prediction. Alas, it is not even a discipline fully contained within STEM (Science, Technology, Engineering, and Mathematics) fields (Meng, 2019). But one thing that we can assert with high confidence is that data science is always about data. Our aims of this book are twofold:

We cover why Julia is an extremely effective language for data science in Section 2. For now, let’s turn our attention towards data.

1.1.1 Data Literacy

According to Wikipedia, the formal definition of data literacy is “the ability to read, understand, create, and communicate data as information.”. We also like the informal idea that, being data literate, you won’t feel overwhelmed by data, but instead can use it to make the right decisions. Data literacy can be seen as a highly competitive skill to possess. In this book we’ll cover two aspects of data literacy:

  1. Data Manipulation with DataFrames.jl (Chapter 4) and DataFramesMeta.jl (Chapter 5). In these chapters you will learn how to:
    1. Read CSV and Excel data into Julia.
    2. Process data in Julia, that is, learn how to answer data questions.
    3. Filter and subset data.
    4. Handle missing data.
    5. Join multiple data sources together.
    6. Group and summarize data.
    7. Export data out of Julia to CSV and Excel files.
  2. Data Visualization with Makie.jl (Chapter 6). In this chapter you will learn how to:
    1. Plot data with different Makie.jl backends.
    2. Save visualizations in several formats such as PNG or PDF.
    3. Use different plotting functions to make diverse data visualizations.
    4. Customize visualizations with attributes.
    5. Use and create new plotting themes.
    6. Add \(\LaTeX\) elements to plots.
    7. Manipulate color and palettes.
    8. Create complex figure layouts.


Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso