TIL I learned that you can filter on conditions for more than one variable at a time using if_any()
or if_all()
.
Turns out that across()
is only for selecting functions (like summarize()
and mutate()
). This was announced in dplyr 1.0.4.
You use if_any()
vs. if_all()
depending if you need to match some vs. all columns.
if_any()
:
mtcars %>%
as_tibble() %>%
mutate(across(everything(), as.integer)) %>%
filter(if_any(contains("m"), ~ . == 0))
# A tibble: 19 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 21 6 258 110 3 3 19 1 0 3 1
2 18 8 360 175 3 3 17 0 0 3 2
3 18 6 225 105 2 3 20 1 0 3 1
4 14 8 360 245 3 3 15 0 0 3 4
5 24 4 146 62 3 3 20 1 0 4 2
6 22 4 140 95 3 3 22 1 0 4 2
7 19 6 167 123 3 3 18 1 0 4 4
8 17 6 167 123 3 3 18 1 0 4 4
9 16 8 275 180 3 4 17 0 0 3 3
10 17 8 275 180 3 3 17 0 0 3 3
11 15 8 275 180 3 3 18 0 0 3 3
12 10 8 472 205 2 5 17 0 0 3 4
13 10 8 460 215 3 5 17 0 0 3 4
14 14 8 440 230 3 5 17 0 0 3 4
15 21 4 120 97 3 2 20 1 0 3 1
16 15 8 318 150 2 3 16 0 0 3 2
17 15 8 304 150 3 3 17 0 0 3 2
18 13 8 350 245 3 3 15 0 0 3 4
19 19 8 400 175 3 3 17 0 0 3 2
if_all()
:
large <- function(x) {
x > mean(x, na.rm = TRUE)
}
mtcars %>%
as_tibble() %>%
mutate(across(everything(), as.integer)) %>%
filter(if_all(contains("m"), large))
# A tibble: 10 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 21 6 160 110 3 2 16 0 1 4 4
2 21 6 160 110 3 2 17 0 1 4 4
3 22 4 108 93 3 2 18 1 1 4 1
4 32 4 78 66 4 2 19 1 1 4 1
5 30 4 75 52 4 1 18 1 1 4 2
6 33 4 71 65 4 1 19 1 1 4 1
7 27 4 79 66 4 1 18 1 1 4 1
8 26 4 120 91 4 2 16 0 1 5 2
9 30 4 95 113 3 1 16 1 1 5 2
10 21 4 121 109 4 2 18 1 1 4 2
Any tidyselect usage is allowable inside if_*()
just like inside across()
, so they work very similarly.
Thanks to @gvelasq for his explanation.