Whereas the @select macro variants performs column selection, the @transform macro variants do not perform any column selection. It can either overwrite existent columns or create new columns that will be added to the right of our DataFrame.
For example, the previous operation on :grade can be invoked as a transformation with:
@rtransform df :grade_100 = :grade * 10
| name | grade | grade_100 |
|---|---|---|
| Sally | 1.0 | 10.0 |
| Bob | 5.0 | 50.0 |
| Alice | 8.5 | 85.0 |
| Hank | 4.0 | 40.0 |
| Bob | 9.5 | 95.0 |
| Sally | 9.5 | 95.0 |
| Hank | 6.0 | 60.0 |
As you can see, @transform does not perform column selection, and the :grade_100 column is created as a new column and added to the right of our DataFrame.
DataFramesMeta.jl macros also support begin ... end statements. For example, suppose that you are creating two columns in a @transform macro:
@rtransform df :grade_100 = :grade * 10 :grade_5 = :grade / 2
| name | grade | grade_100 | grade_5 |
|---|---|---|---|
| Sally | 1.0 | 10.0 | 0.5 |
| Bob | 5.0 | 50.0 | 2.5 |
| Alice | 8.5 | 85.0 | 4.25 |
| Hank | 4.0 | 40.0 | 2.0 |
| Bob | 9.5 | 95.0 | 4.75 |
| Sally | 9.5 | 95.0 | 4.75 |
| Hank | 6.0 | 60.0 | 3.0 |
It can be cumbersome and difficult to read the performed transformations. To facilitate that, we can use begin ... end statements and put one transformation per line:
@rtransform df begin
:grade_100 = :grade * 10
:grade_5 = :grade / 2
end
| name | grade | grade_100 | grade_5 |
|---|---|---|---|
| Sally | 1.0 | 10.0 | 0.5 |
| Bob | 5.0 | 50.0 | 2.5 |
| Alice | 8.5 | 85.0 | 4.25 |
| Hank | 4.0 | 40.0 | 2.0 |
| Bob | 9.5 | 95.0 | 4.75 |
| Sally | 9.5 | 95.0 | 4.75 |
| Hank | 6.0 | 60.0 | 3.0 |
We can also use other columns in our transformations, which makes DataFramesMeta.jl more appealing than DataFrames.jl due to the easier syntax.
First, let’s revisit the leftjoined DataFrame from Chapter 4:
leftjoined = leftjoin(grades_2020(), grades_2021(); on=:name)
| name | grade_2020 | grade_2021 |
|---|---|---|
| Sally | 1.0 | 9.5 |
| Hank | 4.0 | 6.0 |
| Bob | 5.0 | missing |
| Alice | 8.5 | missing |
Additionally, we’ll replace the missing values with 5 (Section 4.9, also note the ! in in-place variant @rtransform!):
@rtransform! leftjoined :grade_2021 = coalesce(:grade_2021, 5)
| name | grade_2020 | grade_2021 |
|---|---|---|
| Sally | 1.0 | 9.5 |
| Hank | 4.0 | 6.0 |
| Bob | 5.0 | 5 |
| Alice | 8.5 | 5 |
This is how you calculate the mean of grades in both years using DataFramesMeta.jl:
@rtransform leftjoined :mean_grades = (:grade_2020 + :grade_2021) / 2
| name | grade_2020 | grade_2021 | mean_grades |
|---|---|---|---|
| Sally | 1.0 | 9.5 | 5.25 |
| Hank | 4.0 | 6.0 | 5.0 |
| Bob | 5.0 | 5 | 5.0 |
| Alice | 8.5 | 5 | 6.75 |
This is how you would perform it in DataFrames.jl:
transform(leftjoined, [:grade_2020, :grade_2021] => ByRow((x, y) -> (x + y) / 2) => :mean_grades)
| name | grade_2020 | grade_2021 | mean_grades |
|---|---|---|---|
| Sally | 1.0 | 9.5 | 5.25 |
| Hank | 4.0 | 6.0 | 5.0 |
| Bob | 5.0 | 5 | 5.0 |
| Alice | 8.5 | 5 | 6.75 |
As you can see, the case for easier syntax is not hard to argue for DataFramesMeta.jl.