5.4 Row Selection

We already covered two macros that operate on columns, @select and @transform.

Now let’s cover the only macro we need to operate on rows: @subset It follows the same principles we’ve seen so far with DataFramesMeta.jl, except that the operation must return a boolean variable for row selection.

Let’s filter grades above 7:

@rsubset df :grade > 7
name grade
Alice 8.5
Bob 9.5
Sally 9.5

As you can see, @subset has also a vectorized variant @rsubset. Sometimes we want to mix and match vectorized and non-vectorized function calls. For instance, suppose that we want to filter out the grades above the mean grade:

@subset df :grade .> mean(:grade)
name grade
Alice 8.5
Bob 9.5
Sally 9.5

For this, we need a @subset macro with the > operator vectorized, since we want a element-wise comparison, but the mean function needs to operate on the whole column of values.

@subset also supports multiple operations inside a begin ... end statement:

@rsubset df begin
    :grade > 7
    startswith(:name, "A")
end
name grade
Alice 8.5


Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso