基于字符串前缀的data.frame行逻辑子设置

时间:2019-08-01 19:31:47

标签: r dplyr

使用base::startsWith逻辑测试来过滤data.frame

library(dplyr)
df <- tibble::rownames_to_column(mtcars, "Make") #sample df 
df_sub <- filter(df, startsWith(Make, c("Mas","Maz","Mer")))

产生意外结果,即只有一个马自达和&2 Merc,并且df中还有更多匹配项。是否有一种类似的方法,就像我认为RegEx会过大?

3 个答案:

答案 0 :(得分:3)

那又怎么样:

filter(df, substr(Make, 1, 3) %in% c("Mas","Maz","Mer"))

            Make  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1      Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2  Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3      Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
4       Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
5       Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
6      Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
7     Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
8     Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
9    Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
10 Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

还有一种可能性:

prefix <- c("Mas","Maz","Mer")
df[apply(sapply(prefix, function(x) startsWith(df$Make, x)), 1, any), ]

第二种可能性也可以用在dplyr::filter()中,尽管它不是很整洁:

filter(df, apply(sapply(prefix, function(x) startsWith(Make, x)), 1, any))

答案 1 :(得分:3)

df[grepl("^Mas|^Maz|^Mer", df$Make),]
#OR
df[grepl(paste(paste0("^", c("Mas","Maz","Mer")), collapse = "|"), df$Make),]
#            Make  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1      Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#2  Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#8      Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#9       Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#10      Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#11     Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#12    Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#13    Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#14   Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#31 Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

答案 2 :(得分:1)

我们还可以使用str_detect(来自select_helpers来匹配正则表达式子字符串)

library(dplyr)
library(stringr)
df %>%
     filter(str_detect(Make, (str_c(c("^Mas", "^Maz", "^Mer"), collapse="|"))))
#             Make  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1      Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#2  Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#3      Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#4       Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#5       Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#6      Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#7     Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#8     Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#9    Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#10 Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8