Filter on conditions for more than one variable at the time

Published

March 21, 2022

TIL I learned that you can filter on conditions for more than one variable at a time using if_any() or if_all().

Turns out that across() is only for selecting functions (like summarize() and mutate()). This was announced in dplyr 1.0.4.

You use if_any() vs. if_all() depending if you need to match some vs. all columns.

if_any():

mtcars %>%
  as_tibble() %>%
  mutate(across(everything(), as.integer)) %>%
  filter(if_any(contains("m"), ~ . == 0))
# A tibble: 19 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1    21     6   258   110     3     3    19     1     0     3     1
 2    18     8   360   175     3     3    17     0     0     3     2
 3    18     6   225   105     2     3    20     1     0     3     1
 4    14     8   360   245     3     3    15     0     0     3     4
 5    24     4   146    62     3     3    20     1     0     4     2
 6    22     4   140    95     3     3    22     1     0     4     2
 7    19     6   167   123     3     3    18     1     0     4     4
 8    17     6   167   123     3     3    18     1     0     4     4
 9    16     8   275   180     3     4    17     0     0     3     3
10    17     8   275   180     3     3    17     0     0     3     3
11    15     8   275   180     3     3    18     0     0     3     3
12    10     8   472   205     2     5    17     0     0     3     4
13    10     8   460   215     3     5    17     0     0     3     4
14    14     8   440   230     3     5    17     0     0     3     4
15    21     4   120    97     3     2    20     1     0     3     1
16    15     8   318   150     2     3    16     0     0     3     2
17    15     8   304   150     3     3    17     0     0     3     2
18    13     8   350   245     3     3    15     0     0     3     4
19    19     8   400   175     3     3    17     0     0     3     2

if_all():

large <- function(x) {
  x > mean(x, na.rm = TRUE)
}

mtcars %>%
  as_tibble() %>%
  mutate(across(everything(), as.integer)) %>%
  filter(if_all(contains("m"), large))
# A tibble: 10 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1    21     6   160   110     3     2    16     0     1     4     4
 2    21     6   160   110     3     2    17     0     1     4     4
 3    22     4   108    93     3     2    18     1     1     4     1
 4    32     4    78    66     4     2    19     1     1     4     1
 5    30     4    75    52     4     1    18     1     1     4     2
 6    33     4    71    65     4     1    19     1     1     4     1
 7    27     4    79    66     4     1    18     1     1     4     1
 8    26     4   120    91     4     2    16     0     1     5     2
 9    30     4    95   113     3     1    16     1     1     5     2
10    21     4   121   109     4     2    18     1     1     4     2

Any tidyselect usage is allowable inside if_*() just like inside across(), so they work very similarly.

Thanks to @gvelasq for his explanation.