Question

我有一个关键字列表：

library(stringr)
words <- as.character(c("decomposed", "no diagnosis","decomposition","autolysed","maggots", "poor body", "poor","not suitable", "not possible"))

我希望将这些关键字与数据框列（df $ text）中的文本进行匹配，并计算关键字在不同data.frame（matchdf）中出现的次数：

matchdf<- data.frame(Keywords=words)
m_match<-sapply(1:length(words), function(x) sum(str_count(tolower(df$text),words[[x]])))
matchdf$matchs<-m_match

但是，我注意到此方法会计算列中每个关键字的出现次数。例如）

"The sample was too decomposed to perform an analysis. The decomposed sample indicated that this animal was dead for a long time"

然后返回2的计数。但是，我只想计算字段中“分解”的第一个实例。

我认为有一种方法可以只使用str_count计算第一个实例，但似乎没有。

Answer 1

在此示例中，字符串并非绝对必要，来自基本R的bar就足够了。也就是说，使用grepl代替str_detect，如果您更喜欢包功能（正如@ Chi-Pak在评论中指出的那样）

grepl

结果

library(stringr)

words <- c("decomposed", "no diagnosis","decomposition","autolysed","maggots", 
           "poor body", "poor","not suitable", "not possible")

df <- data.frame( text = "The sample was too decomposed to perform an analysis. The decomposed sample indicated that this animal was dead for a long time")

matchdf <- data.frame(Keywords = words, stringsAsFactors = FALSE)

# Base R grepl
matchdf$matches1 <- sapply(1:length(words), function(x) as.numeric(grepl(words[x], tolower(df$text))))

# Stringr function
matchdf$matches2 <- sapply(1:length(words), function(x) as.numeric(str_detect(tolower(df$text),words[[x]])))

matchdf

计算列表中第一个关键字实例，R中没有重复计数

1 个答案: