R计数匹配的字符串

时间:2016-04-12 22:23:44

标签: r

我正在尝试在字符串之间计算匹配的项目:

target_str = "a,b,c"
table1 = data.frame(name = c("p1","p2","p3","p4"),
                    str = c("a,b","a","d,e,f","a,a"))

根据target_str,计算匹配数量。我希望我的输出表看起来像这样:

name       matches
p1         2        #matches a and b
p2         1        #matches a
p3         0        #no matches
p4         1        #if has duplicate, count only once

我有大约100万个target_str需要计算匹配,因此速度非常重要。感谢任何建议。提前谢谢!

3 个答案:

答案 0 :(得分:2)

target_str = "a,b,c"
split_str <- strsplit(target_str, split = ",")[[1]]
table1 = data.frame(name = c("p1","p2","p3","p4"),
                    str = c("a,b","a","d,e,f","a,a"))
data.frame(name = table1$name,
           matches = rowSums(sapply(split_str, grepl, x = table1$str)))

#   name matches
# 1   p1       2
# 2   p2       1
# 3   p3       0
# 4   p4       1

答案 1 :(得分:1)

这应该相当快:

# target string modified to be a character vector:
target_str <- unlist(strsplit(c("a,b,c"), split=","))

# separate each obervations strings:
stringList <- sapply(s, strsplit, split=",")

# get counts, put into data.frame
table1$Counts <- sapply(stringList, function(i) sum(i %in% target_str))

答案 2 :(得分:1)

此cbinds计数到第一列,保留为drop = FALSE的数据帧。从连续测试中加入计数,用于&#34; in-ness&#34;与grepl

cbind( table1[ ,1,drop=FALSE], counts=rowSums(sapply( scan(text=target_str, sep= ",", what=""),  function(t) { grepl( t, table1$str)})) )
Read 3 items
  name counts
a   p1      2
b   p2      1
c   p3      0