Question

我有一个data.frame，其中某些变量包含文本字符串。我的目标是计算每个单独的字符串中给定NUMBER的唯一出现次数。

其他帖子表明，这可以通过纵梁来完成 How to calculate the number of occurrence of a given character in each row of a column of strings? calculate the total number of occurrence of a list of keyword in a string column count number of numbers (not digits) in a string

例如... 范例1：

q.data <- data.frame(number=1:4, 
                     string=c("1", "12", "3", "31"))

stringr::str_count(q.data$string, c("1")) 

# gives (1,1,0,1)

这得到c(1,1,0,1)。我真正想要的是创建一个新列c(1)，该列指示数字“ 1”出现一次。然后，我想扩展它以包含多个关键字，例如

示例2：

stringr::str_count(q.data$string, c("1", "31"))

此新列现在为c(2)，表示这些数字出现了两次。

任何帮助，将不胜感激。

Answer 1

您可以使用data.table：

# load library and convert to data.table
setDT(q.data)

# Count occurrences of "1":
q.data[string %in% "1", .N] # string == "1" could have been used too

# Count occurrences of values in a vector:
q.data[string %in% c("1", "31"), .N]

.N计算行数。逗号前的表达式过滤数据。 %in%位检查元素是否包含在另一个集合中。

检查?data.table和?match以获得更多详细信息。

Answer 2

您可以将要检查的字符串放在列表中，然后使用sapply。不确定您想要什么输出结构，但是无论如何这是一个开始-

checklist <- list("1", c("1", "31"))

sapply(checklist, function(x) {
  sum(x %in% q.data$string)
})

[1] 1 2

如何计算一列字符串的每一行中给定NUMBER的出现次数？

2 个答案: