从数据框中我想要包含某些模式的所有行,例如“A”或“36”或“1?2”。我不关心哪个列与模式匹配,只要该行中某处存在匹配。
数据帧:
aName bName pName call alleles logRatio strength
AX-11086564 F08_ADN103 2011-02-10_R10 AB CG 0.363371 10.184215
AX-11086564 A01_CD1919 2011-02-24_R11 BB GG -1.352707 9.54909
AX-11086564 B05_CD2920 2011-01-27_R6 AB CG -0.183802 9.766334
AX-11086564 D04_CD5950 2011-02-09_R9 AB CG 0.162586 10.165051
AX-11086564 D07_CD6025 2011-02-10_R10 AB CG -0.397097 9.940238
AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
AX-11086564 D04_ADN103 2011-02-10_R2 BB GG -1.898088 9.872966
AX-11086564 A01_CD2588 2011-01-27_R5 BB GG -1.208094 9.239801
我的实际数据框包含很多行,我不想硬编码他们的名字。模式可能更复杂,所以我想使用正则表达式。
在R中读取此数据框的代码
data <- read.table(textConnection("
aName bName pName call alleles logRatio strength
AX-11086564 F08_ADN103 2011-02-10_R10 AB CG 0.363371 10.184215
AX-11086564 A01_CD1919 2011-02-24_R11 BB GG -1.352707 9.54909
AX-11086564 B05_CD2920 2011-01-27_R6 AB CG -0.183802 9.766334
AX-11086564 D04_CD5950 2011-02-09_R9 AB CG 0.162586 10.165051
AX-11086564 D07_CD6025 2011-02-10_R10 AB CG -0.397097 9.940238
AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
AX-11086564 D04_ADN103 2011-02-10_R2 BB GG -1.898088 9.872966
AX-11086564 A01_CD2588 2011-01-27_R5 BB GG -1.208094 9.239801
"), header = TRUE)
答案 0 :(得分:2)
您可以使用grepl
apply
和rowSums
> rowSums(apply(data, 2, grepl, pattern = "A")) > 0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> rowSums(apply(data, 2, grepl, pattern = "1?2")) > 0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> rowSums(apply(data, 2, grepl, pattern = "36")) > 0
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
> out <- rowSums(apply(data, 2, grepl, pattern = "36")) > 0
> data[out,]
aName bName pName call alleles logRatio strength
1 AX-11086564 F08_ADN103 2011-02-10_R10 AB CG 0.363371 10.184215
6 AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
注意apply
将强制as.vector
答案 1 :(得分:2)
在这里,我在data.frame中定义了一个grep包装器来搜索:
search_data_frame <-
function(patt,data)
unlist(lapply (seq_len(nrow(data)),function(i) grep(patt,data[i,])))
然后你使用它:
data[search_data_frame('36',data),]
aName bName pName call alleles logRatio strength
6 AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
2 AX-11086564 A01_CD1919 2011-02-24_R11 BB GG -1.352707 9.549090
请注意,我使用stringsAsFactors=FALSE
读取您的数据,否则您应该将您的因素强制转换为字符。
`