我有这样的数据
df<- structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
No = c(8L, 4L, 9L, 2L, 7L, 3L, 5L, 1L, 2L), NO1 = c(1L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L), Accessions = structure(c(6L,
5L, 1L, 3L, 2L, 7L, 4L, 9L, 8L), .Label = c("E9PCL5", "P00367",
"P05783", "P63104", "Q6DD88", "Q6FI13", "Q6P597-3", "Q7Z406-6",
"Q9BUA3"), class = "factor"), Accessions2 = structure(c(6L,
2L, 1L, 4L, 3L, 7L, 5L, 9L, 8L), .Label = c("B4DIW5; F8WA69; P56945; E9PCV2; F5GXA2; F5H855; E9PCL5; F5GXV6; F5H7Z0",
"F5GWF8; F5H6I7; Q6DD88", "P00367; B3KV55; B4DGN5; P49448; F5GYQ4; H0YFJ0; F8WA20",
"P05783; F8VZY9", "P63104; E7EX24; H0YB80; B0AZS6; B7Z2E6",
"Q16777; Q99878; F8WA69; H0YFX9; Q9BTM1; P20671; P0C0S8; Q6FI13",
"Q6P597-3; Q6P597-2; Q6P597", "Q7Z406-2; Q7Z406-6", "Q9BUA3"
), class = "factor"), NO3 = c(1L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 0L)), .Names = c("Groups", "No", "NO1", "Accessions",
"Accessions2", "NO3"), class = "data.frame", row.names = c(NA,
-9L))
我试图在Accession2列中找到那些具有特定字符串的行,然后总结NO1
例如,我想知道F8WA69和Q9BUA3存在于哪些行中 所以它将是
Groups No NO1 Accessions Accessions2 NO3
1 8 1 Q6FI13 Q16777; Q99878; F8WA69; H0YFX9; Q9BTM1; P20671; P0C0S8; Q6FI13 1
1 9 1 E9PCL5 B4DIW5; F8WA69; P56945; E9PCV2; F5GXA2; F5H855; E9PCL5; F5GXV6; F5H7Z0 0
和
Groups No NO1 Accessions Accessions2 NO3
1 3 1 Q6P597-3 Q6P597-3; Q6P597-2; Q9BUA3 0
1 1 1 Q9BUA3 Q9BUA3 0
1 2 1 Q7Z406-6 Q7Z406-2; Q9BUA3 0
然后总结每个人的NO1
第一个是2,第二个是3
答案 0 :(得分:1)
您可以使用简单的grepl
或grep
来查找ID所在的行。
例如:
grepl("F8WA69", df$Accessions2)
[1] TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
要使用数据的子集:
df[grepl("F8WA69", df$Accessions2), ]
如果你想迭代多个ID并求和NO1
,你可以使用sapply
:
sapply(c("F8WA69", "Q9BUA3"),
function(x) sum(df[grepl(x, df$Accessions2), ]$NO1))
F8WA69 Q9BUA3
2 1