获取包含特定字符串

时间:2018-02-01 19:24:36

标签: r

我有这样的数据

df<- structure(list(Groups = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
    No = c(8L, 4L, 9L, 2L, 7L, 3L, 5L, 1L, 2L), NO1 = c(1L, 1L, 
    1L, 2L, 1L, 1L, 1L, 1L, 1L), Accessions = structure(c(6L, 
    5L, 1L, 3L, 2L, 7L, 4L, 9L, 8L), .Label = c("E9PCL5", "P00367", 
    "P05783", "P63104", "Q6DD88", "Q6FI13", "Q6P597-3", "Q7Z406-6", 
    "Q9BUA3"), class = "factor"), Accessions2 = structure(c(6L, 
    2L, 1L, 4L, 3L, 7L, 5L, 9L, 8L), .Label = c("B4DIW5; F8WA69; P56945; E9PCV2; F5GXA2; F5H855; E9PCL5; F5GXV6; F5H7Z0", 
    "F5GWF8; F5H6I7; Q6DD88", "P00367; B3KV55; B4DGN5; P49448; F5GYQ4; H0YFJ0; F8WA20", 
    "P05783; F8VZY9", "P63104; E7EX24; H0YB80; B0AZS6; B7Z2E6", 
    "Q16777; Q99878; F8WA69; H0YFX9; Q9BTM1; P20671; P0C0S8; Q6FI13", 
    "Q6P597-3; Q6P597-2; Q6P597", "Q7Z406-2; Q7Z406-6", "Q9BUA3"
    ), class = "factor"), NO3 = c(1L, 0L, 0L, 0L, 1L, 0L, 1L, 
    0L, 0L)), .Names = c("Groups", "No", "NO1", "Accessions", 
"Accessions2", "NO3"), class = "data.frame", row.names = c(NA, 
-9L))

我试图在Accession2列中找到那些具有特定字符串的行,然后总结NO1

例如,我想知道F8WA69和Q9BUA3存在于哪些行中 所以它将是

Groups  No  NO1 Accessions  Accessions2 NO3
1   8   1   Q6FI13  Q16777; Q99878; F8WA69; H0YFX9; Q9BTM1; P20671; P0C0S8; Q6FI13  1
1   9   1   E9PCL5  B4DIW5; F8WA69; P56945; E9PCV2; F5GXA2; F5H855; E9PCL5; F5GXV6; F5H7Z0  0

Groups  No  NO1 Accessions  Accessions2 NO3
1   3   1   Q6P597-3    Q6P597-3; Q6P597-2; Q9BUA3  0
1   1   1   Q9BUA3  Q9BUA3  0
1   2   1   Q7Z406-6    Q7Z406-2; Q9BUA3    0

然后总结每个人的NO1

第一个是2,第二个是3

1 个答案:

答案 0 :(得分:1)

您可以使用简单的greplgrep来查找ID所在的行。

例如:

grepl("F8WA69", df$Accessions2)
[1]  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

要使用数据的子集:

df[grepl("F8WA69", df$Accessions2), ]

如果你想迭代多个ID并求和NO1,你可以使用sapply

sapply(c("F8WA69", "Q9BUA3"),
       function(x) sum(df[grepl(x, df$Accessions2), ]$NO1))
F8WA69 Q9BUA3 
     2      1