如何将字符向量转换为正则表达式以进行数据帧索引

时间:2014-02-07 05:03:51

标签: regex r grep dataframe

我有以下数据框:

 df<- structure(list(ID = c(9000099L, 9000296L, 9000622L, 9000798L, 
 9001104L, 9001400L), VERSION = structure(c(1L, 1L, 1L, 1L, 1L, 
 1L), .Label = "1.2.1", class = "factor"), V01SF1 = c(1L, 2L, 
2L, 3L, 2L, 1L), V01SF2 = c(3L, 3L, 3L, 3L, 3L, 3L), V01BD1 = c(2L, 
3L, 3L, 2L, 3L, 3L), V01BD2 = c(5L, 5L, 5L, 3L, 5L, 5L)), .Names = c("ID", 
 "VERSION", "V01SF1", "V01SF2", "V01BD1", "V01BD2"), row.names = c(NA, 
6L), class = "data.frame")

    > df
       ID VERSION V01SF1 V01SF2 V01BD1 V01BD2
1 9000099   1.2.1      1      3      2      5
2 9000296   1.2.1      2      3      3      5
3 9000622   1.2.1      2      3      3      5
4 9000798   1.2.1      3      3      2      3
5 9001104   1.2.1      2      3      3      5
6 9001400   1.2.1      1      3      3      5

我想用“VERSION”列和名称中包含SF和DF的列索引此数据框。我有一个向量,我想用它作为df名称中的搜索模式:

   vars<- c ("SF", "DF")

我很容易为VERSION执行索引:

 df [grep ("SION", names (df), value =T)]


   VERSION
   1   1.2.1
   2   1.2.1
   3   1.2.1
   4  1.2.1
   5   1.2.1
   6   1.2.1

如何将矢量grep ("SION", names (df), value =T)中的vars<- c ("SF", "DF")元素添加为grep模式? 生成的代码应该作为df [grep ("SION|SF|BD", names (df), value =T)]提供以下输出:

   VERSION V01SF1 V01SF2 V01BD1 V01BD2
 1   1.2.1      1      3      2      5
 2   1.2.1      2      3      3      5
 3   1.2.1      2      3      3      5
 4   1.2.1      3      3      2      3
 5   1.2.1      2      3      3      5
 6   1.2.1      1      3      3      5

非常感谢

3 个答案:

答案 0 :(得分:3)

试试这个:

vars<- c ("SF", "BD")
version = "VERSION"

pattern = paste(c(version, vars), collapse="|")

> pattern
[1] "VERSION|SF|BD"

ind = grep(pattern, names(df), value=TRUE)

> ind
[1] "VERSION" "V01SF1"  "V01SF2"  "V01BD1"  "V01BD2" 

诀窍来自grep的第一个参数只是一个包含正则表达式的字符向量。因此,您可以使用paste正确构建常规表达式。现在,您可以索引data.frame。

dfx = df[, ind]


> dfx
  VERSION V01SF1 V01SF2 V01BD1 V01BD2
1   1.2.1      1      3      2      5
2   1.2.1      2      3      3      5
3   1.2.1      2      3      3      5
4   1.2.1      3      3      2      3
5   1.2.1      2      3      3      5
6   1.2.1      1      3      3      5

答案 1 :(得分:2)

像这样:

vars <- c("SF","BD")
vars
#[1] "SF" "BD"

df[grepl(paste(c("SION",vars),collapse="|"),names(df))]

#  VERSION V01SF1 V01SF2 V01BD1 V01BD2
#1   1.2.1      1      3      2      5
#2   1.2.1      2      3      3      5
#3   1.2.1      2      3      3      5
#4   1.2.1      3      3      2      3
#5   1.2.1      2      3      3      5
#6   1.2.1      1      3      3      5

答案 2 :(得分:1)

首先将s定义为:

s <- c("SION", vars)

现在尝试:

g <- sapply(s, grepl, names(df))
df[ apply(g, 1, any) ]

df[ unlist(sapply(s, grep, names(df))) ]

df[ unlist(Vectorize(function(s) grep(s, names(df)))(s)) ]

pat <- paste(s, collapse = "|")
df[ grepl(pat, names(df)) ]