Question

我正在尝试在字符串之间计算匹配的项目：

target_str = "a,b,c"
table1 = data.frame(name = c("p1","p2","p3","p4"),
                    str = c("a,b","a","d,e,f","a,a"))

根据target_str，计算匹配数量。我希望我的输出表看起来像这样：

name       matches
p1         2        #matches a and b
p2         1        #matches a
p3         0        #no matches
p4         1        #if has duplicate, count only once

我有大约100万个target_str需要计算匹配，因此速度非常重要。感谢任何建议。提前谢谢！

Answer 1

target_str = "a,b,c"
split_str <- strsplit(target_str, split = ",")[[1]]
table1 = data.frame(name = c("p1","p2","p3","p4"),
                    str = c("a,b","a","d,e,f","a,a"))
data.frame(name = table1$name,
           matches = rowSums(sapply(split_str, grepl, x = table1$str)))

#   name matches
# 1   p1       2
# 2   p2       1
# 3   p3       0
# 4   p4       1

Answer 2

这应该相当快：

# target string modified to be a character vector:
target_str <- unlist(strsplit(c("a,b,c"), split=","))

# separate each obervations strings:
stringList <- sapply(s, strsplit, split=",")

# get counts, put into data.frame
table1$Counts <- sapply(stringList, function(i) sum(i %in% target_str))

Answer 3

此cbinds计数到第一列，保留为drop = FALSE的数据帧。从连续测试中加入计数，用于＆＃34; in-ness＆＃34;与grepl：

cbind( table1[ ,1,drop=FALSE], counts=rowSums(sapply( scan(text=target_str, sep= ",", what=""),  function(t) { grepl( t, table1$str)})) )
Read 3 items
  name counts
a   p1      2
b   p2      1
c   p3      0

R计数匹配的字符串

3 个答案: