Column Transformation - Julia Data Science

5.3 Column Transformation

Whereas the @select macro variants performs column selection, the @transform macro variants do not perform any column selection. It can either overwrite existent columns or create new columns that will be added to the right of our DataFrame.

For example, the previous operation on :grade can be invoked as a transformation with:

@rtransform df :grade_100 = :grade * 10

name	grade	grade_100
Sally	1.0	10.0
Bob	5.0	50.0
Alice	8.5	85.0
Hank	4.0	40.0
Bob	9.5	95.0
Sally	9.5	95.0
Hank	6.0	60.0

As you can see, @transform does not perform column selection, and the :grade_100 column is created as a new column and added to the right of our DataFrame.

DataFramesMeta.jl macros also support begin ... end statements. For example, suppose that you are creating two columns in a @transform macro:

@rtransform df :grade_100 = :grade * 10 :grade_5 = :grade / 2

name	grade	grade_100	grade_5
Sally	1.0	10.0	0.5
Bob	5.0	50.0	2.5
Alice	8.5	85.0	4.25
Hank	4.0	40.0	2.0
Bob	9.5	95.0	4.75
Sally	9.5	95.0	4.75
Hank	6.0	60.0	3.0

It can be cumbersome and difficult to read the performed transformations. To facilitate that, we can use begin ... end statements and put one transformation per line:

@rtransform df begin
    :grade_100 = :grade * 10
    :grade_5 = :grade / 2
end

name	grade	grade_100	grade_5
Sally	1.0	10.0	0.5
Bob	5.0	50.0	2.5
Alice	8.5	85.0	4.25
Hank	4.0	40.0	2.0
Bob	9.5	95.0	4.75
Sally	9.5	95.0	4.75
Hank	6.0	60.0	3.0

We can also use other columns in our transformations, which makes DataFramesMeta.jl more appealing than DataFrames.jl due to the easier syntax.

First, let’s revisit the leftjoined DataFrame from Chapter 4:

leftjoined = leftjoin(grades_2020(), grades_2021(); on=:name)

name	grade_2020	grade_2021
Sally	1.0	9.5
Hank	4.0	6.0
Bob	5.0	missing
Alice	8.5	missing

Additionally, we’ll replace the missing values with 5 (Section 4.9, also note the ! in in-place variant @rtransform!):

@rtransform! leftjoined :grade_2021 = coalesce(:grade_2021, 5)

name	grade_2020	grade_2021
Sally	1.0	9.5
Hank	4.0	6.0
Bob	5.0	5
Alice	8.5	5

This is how you calculate the mean of grades in both years using DataFramesMeta.jl:

@rtransform leftjoined :mean_grades = (:grade_2020 + :grade_2021) / 2

name	grade_2020	grade_2021	mean_grades
Sally	1.0	9.5	5.25
Hank	4.0	6.0	5.0
Bob	5.0	5	5.0
Alice	8.5	5	6.75

This is how you would perform it in DataFrames.jl:

transform(leftjoined, [:grade_2020, :grade_2021] => ByRow((x, y) -> (x + y) / 2) => :mean_grades)

name	grade_2020	grade_2021	mean_grades
Sally	1.0	9.5	5.25
Hank	4.0	6.0	5.0
Bob	5.0	5	5.0
Alice	8.5	5	6.75

As you can see, the case for easier syntax is not hard to argue for DataFramesMeta.jl.

5.2 Column Selection ← → 5.4 Row Selection

Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso