我有一个关键字列表:
library(stringr)
words <- as.character(c("decomposed", "no diagnosis","decomposition","autolysed","maggots", "poor body", "poor","not suitable", "not possible"))
我希望将这些关键字与数据框列(df $ text)中的文本进行匹配,并计算关键字在不同data.frame(matchdf)中出现的次数:
matchdf<- data.frame(Keywords=words)
m_match<-sapply(1:length(words), function(x) sum(str_count(tolower(df$text),words[[x]])))
matchdf$matchs<-m_match
但是,我注意到此方法会计算列中每个关键字的出现次数。例如)
"The sample was too decomposed to perform an analysis. The decomposed sample indicated that this animal was dead for a long time"
然后返回2的计数。但是,我只想计算字段中“分解”的第一个实例。
我认为有一种方法可以只使用str_count
计算第一个实例,但似乎没有。
答案 0 :(得分:1)
在此示例中,字符串并非绝对必要,来自基本R的bar
就足够了。也就是说,使用grepl
代替str_detect
,如果您更喜欢包功能(正如@ Chi-Pak在评论中指出的那样)
grepl
结果
library(stringr)
words <- c("decomposed", "no diagnosis","decomposition","autolysed","maggots",
"poor body", "poor","not suitable", "not possible")
df <- data.frame( text = "The sample was too decomposed to perform an analysis. The decomposed sample indicated that this animal was dead for a long time")
matchdf <- data.frame(Keywords = words, stringsAsFactors = FALSE)
# Base R grepl
matchdf$matches1 <- sapply(1:length(words), function(x) as.numeric(grepl(words[x], tolower(df$text))))
# Stringr function
matchdf$matches2 <- sapply(1:length(words), function(x) as.numeric(str_detect(tolower(df$text),words[[x]])))
matchdf