编辑底部的可重复示例...
我正在使用大型数据集(来自CDC的汇集的NHAMCS):
> dim(ed0509)
[1] 174020 514
根据矢量列表,我在根据多个列变量grep()
DIAG1
DIAG2
中的模式使用DIAG3
识别数据框中的行时遇到问题感兴趣SSTI.list
。条件是如果在任一列变量中识别出这种模式,那么我想拉出该行号以最终使用它来对数据进行子集化,以在数据集中创建新的分类列SSTI.cat
(0或1)。
SSTI.list <- c("035", "566", "60883", "6110", "6752", "6751", "680","681","682","683","684","684","685","686", "7048", "70583","7070", "7078", "7079", "7071", "7280", "72886", "7714", "7715", "7854", "9583", "99662", "99762", "9985")
由于我正在处理一个非常长的列表&gt; 1000个元素,我试图使用for循环自动执行此过程。所需的输出具有新变量,其中包含向量SSTI.list
中每个值的行列表。我主要在grep()
内运行for循环时遇到问题,我收到错误:
argument 'pattern' has length > 1 and only the first element will be used
到目前为止,我试图做的是:
diags <- c(ed0509$DIAG1,ed0509$DIAG2,ed0509$DIAG3)
for (i in SSTI.list){
assign(paste("var",i,sep=""),grep(paste("^",i,"",sep=""),diags,value=F))
}
SSTI.comb
将是最终的行列表(所有var
i),它们从for循环中标识SSTI.list
中的模式,用于创建分类变量{{ 1}}
然后使用SSTI.cat
包创建分类变量。
data.table
SSTI.comb<-sort(as.numeric(SSTI.comb))
编辑表示可重复性,对不起......
setDT(ed0509)[SSTI.comb,SSTI.cat:=1][,SSTI.cat:=0]
从概念上讲,我希望有一个输出,其中附加到DIAG1=c("00000","4659-","0356-","5664-","771--","7715-","78791")
DIAG2=c("3829-","00000","00000","4659-","7854-","00000","566--")
DIAG3=c("9985-","00000","00000","00000","00000","00000","00000")
df<-data.frame(DIAG1,DIAG2,DIAG3)`
SSTI.list <- c("035","9985","7854","771","7715")
for (i in SSTI.list){
assign(paste("var",i,sep=""),grep(paste("^",i,"",sep=""),diags,value=F))
}
的新列变量将指示第1行,第3行,第5行和第6行被识别为满足df
中指示的模式
SSTI.list
答案 0 :(得分:1)
以下是我在添加数据之前编写的假数据示例。如果这是您的想法,请告诉我:
SSTI.list <- c("035", "566", "60883", "6110", "6752", "6751", "680","681","682","683","684","684",
"685","686", "7048", "70583","7070", "7078", "7079", "7071", "7280", "72886",
"7714", "7715", "7854", "9583", "99662", "99762", "9985")
# Fake data
set.seed(10)
dat = as.data.frame(replicate(5, sample(c(SSTI.list, 1e5:(1e5+1000)),10)), stringsAsFactors=FALSE)
V1 V2 V3 V4 V5 1 100493 100642 100861 100522 100254 2 100286 100555 100604 100066 100206 3 100409 100087 100767 100145 7048 4 100682 100583 100336 100895 100719 5 100058 100338 100387 100404 100227 6 100202 100410 100695 100737 100136 7 100252 100024 100829 100813 7078 8 100249 100241 100216 100947 100468 9 100600 100378 100758 100671 100076 10 100998 100824 100334 100482 100789
# Match any instance of a pattern within any element of the data
dat[apply(dat, 1, function(i) any(grepl(paste(SSTI.list, collapse="|"), i))),]
V1 V2 V3 V4 V5 3 100409 100087 100767 100145 7048 4 100682 100583 100336 100895 100719 # "100682 matches "682" in SSTI.list 7 100252 100024 100829 100813 7078
# Match only if a data element is exactly the same as one of the patterns.
dat[apply(dat, 1, function(i) any(grepl(paste(paste0("^",SSTI.list,"$"), collapse="|"), i))),]
V1 V2 V3 V4 V5 3 100409 100087 100767 100145 7048 7 100252 100024 100829 100813 7078
如果您只想要匹配行的行索引:
which(apply(dat, 1, function(i) any(grepl(paste(SSTI.list, collapse="|"), i))))
[1] 3 4 7