我需要动态地按照规则列表对data.frame
进行子集化。例如,像这样的规则:
df[col1] == 's'
df[col2] == 'z'
df[col3] == 'a' | df[col3] == 'b' | df[col3] == 'c'
静态地说,我只会:
df <- df[df[col1] == 's'
& df[col2] == 'z'
& (df[col3] == 'a' | df[col3] == 'b' | df[col3] == 'c'), ]
如果我有一个存储所有规则的list
,我怎么能以相同的方式实现同样的目标:
rules <- list(col1 = c('s'), col2 = c('z'), col3 = c('a', 'b', 'c'))
我想这样做:
df <- magic(df, rules)
这样的事情可能吗?
答案 0 :(得分:4)
它不是非常概括 - 我的意思是每个元素都是and
s,每个元素中的每个元素都是or
s,但这就是你的问题所要求的。
df <- data.frame(col1 = c('a','s','x'),
col2 = c('a','z','s'),
col3 = c('a','c','b'),
stringsAsFactors = FALSE)
df[with(df, col1 == 's'
& col2 == 'z'
& (col3 == 'a' | col3 == 'b' | col3 == 'c')), ]
# col1 col2 col3
# 2 s z c
rules <- list(col1 = c('s'), col2 = c('z'), col3 = c('a', 'b', 'c'))
df[Reduce(`&`, Map(`%in%`, df, rules)), ]
# col1 col2 col3
# 2 s z c
magic
magic <- function(data, rules) {
data[Reduce(`&`, Map(`%in%`, data, rules)), ]
}
magic(df, rules)
# col1 col2 col3
# 2 s z c
编辑 - 第2版
这个应该适用于1)没有规则的列和/或2)不按列的确切顺序排列的规则
magic <- function(data, rules) {
rules <- rules[names(data)]
idx <- Map(`%in%`, data, rules)
idx[is.na(names(rules))] <- list(rep(TRUE, nrow(data)))
data[Reduce(`&`, idx), ]
}
df <- data.frame(col1 = c('a','s','x'),
col2 = c('a','z','s'),
colx = rnorm(3),
col3 = c('a','c','b'),
stringsAsFactors = FALSE)
rules <- list(col2 = c('z'), col1 = c('s'), col3 = c('a', 'b', 'c'))
magic(df, rules)
# col1 col2 colx col3
# 2 s z -1.374339 c
更多测试
magic(mtcars, list(gear = 4, carb = 1:2))
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2