Question

我无法找到答案如何计算数据框中的单词，如果找到其他单词则排除。我有以下df：

words <- c("INSTANCE find", "LA LA LA", "instance during",
           "instance", "instance", "instance", "find instance")

df <- data.frame(words)
df$words_count <- grepl("instance", df$words, ignore.case = T)

它会计算＆＃34;实例＆＃34; 的所有实例。当找到时，我一直试图排除任何行。

我可以添加另一个grepl来查找＆＃34;查找＆＃34; 并根据该排除但我尝试限制我的代码行数。

Answer 1

我确信使用单个正则表达式的解决方案，但你可以做

df$words_count <- Reduce(`-`, lapply(c('instance', 'find'), grepl, df$words)) > 0

或

df$words_count <- Reduce(`&`, lapply(c('instance', '^((?!find).)*$'), grepl, df$words, perl = T, ignore.case = T))

这可能更容易阅读

library(tidyverse)
df$words_count <- c('instance', '^((?!find).)*$') %>% 
                    lapply(grepl, df$words, perl = T, ignore.case = T) %>%
                    reduce(`&`)

Answer 2

如果您需要的是“实例”出现在字符串中的次数，如果在任何地方找到“find”，则取消该字符串中的所有内容：

df$counts <- sapply(gregexpr("\\binstance\\b", words, ignore.case=TRUE), function(a) length(a[a>0])) *
  !grepl("\\bfind\\b", words, ignore.case=TRUE)
df
#             words counts
# 1   INSTANCE find      0
# 2        LA LA LA      0
# 3 instance during      1
# 4        instance      1
# 5        instance      1
# 6        instance      1
# 7   find instance      0

R - 根据其他字符串

2 个答案: