我有这样的问题: 我有2个txt文件。 一个看起来像这样:
ABCG1
ABLIM1
ABP1
ACOT11
ACP5
包含700多个字符串,第二个字符串如下:
1 2 3 4 5 6 GENE_NAME
0.01857 0.02975 0.02206 0.01847 0.01684 0.01588 NIPA2;NIPA2;NIPA2;NIPA2
0.81992 0.8168 0.76963 0.83116 0.78114 0.85544 MAN1B1
0.13053 0.12308 0.10654 0.11675 0.13664 0.10312 TSEN34;TSEN34
0.91888 0.93095 0.91498 0.91558 0.91126 0.91569 LRRC16A
它的尺寸是90 + x640 000 +
我想提取第二个制表符分隔文件的字符串,其中包含第一个的任何值。我想到了类似的东西:
data=x[1,]
data=data[-1,]
for (i in 1:nrow(test)){
if (grepl("gene_name",test[i,]$GENE_NAME=="TRUE")){
data_temp=x[i,]
data=rbind(data,data_temp)
rm(data_temp)
}
但问题是我必须重复这段代码700多次。有没有办法写这样的smth:
value= c(vector that contains my gene names)
string= (one of srings of my table)
grepl(any(value),string)
我遇到了any
的问题,因为它使向量逻辑而不是字符。
先感谢您。
答案 0 :(得分:0)
这会对你有用吗?
value <- c("ABCG1",
"ABLIM1",
"ABP1",
"ACOT11",
"ACP5")
GENE_NAME <- c("ABCG1;NIPA2;NIPA2",
"ABLIM1",
"ABP1;ABCG1",
"ACOT11",
"TSEN34;TSEN34",
"ACP5",
"LRRC16A") # This is the test$GENE_NAME column
lapply(value, function(x) GENE_NAME[grepl(x, GENE_NAME)])
# [[1]]
# [1] "ABCG1;NIPA2;NIPA2" "ABP1;ABCG1"
#
# [[2]]
# [1] "ABLIM1"
#
# [[3]]
# [1] "ABP1;ABCG1"
#
# [[4]]
# [1] "ACOT11"
#
# [[5]]
# [1] "ACP5"
如果您愿意,可以将其取消列出
unlist(lapply(value, function(x) GENE_NAME[grepl(x, GENE_NAME)]))
# [1] "ABCG1;NIPA2;NIPA2" "ABP1;ABCG1" "ABLIM1" "ABP1;ABCG1" "ACOT11"
# [6] "ACP5"