Question

总新手R问题。我有一个ID和Notes的数据框df：

ID    Notes
1     dogs are friendly
2     dogs and cats are pets
3     cows live on farms
4     cats and cows start with c

我有另一个价值观清单＆＃34;动物＆＃34;

cats
cows

我想添加另一列＆＃34;匹配＆＃34;到包含Notes中所有动物的数据框，例如

ID    Notes                        Matches
1     dogs are friendly            
2     dogs and cats are pets       cats
3     cows live on farms           cows
4     cats and cows start with c   cats, cows

到目前为止，我唯一的运气是使用grepl返回，如果有任何匹配：

grepl(paste(animals,collapse="|"),df$Notes,ignore.case = T)

如何返回值？

更新
我的数据框中有一些行，我有多个猫实例，例如，在我的笔记中：

ID    Notes                             Matches
1     dogs are friendly            
2     dogs and cats are pets            cats
3     cows live on farms                cows
4     cats and cats cows start with c   cats, cows

我只想返回一个匹配的实例。 @LachlanO让我非常接近他的解决方案，但我得到了：

[1] "NA, NA"                      "cats, NA"                    "NA, cows"                    "c(\"cats\", \"cats\"), cows"

我怎样才能返回不同的匹配？

Answer 1

编辑：添加了unique操作来处理重复的匹配。

我可以启动你，然后指出你的方向：）

下面使用stringr :: str_extract_all来提取我们需要的相关位，但遗憾的是，它们给我们留下了一些我们不知道的位，最值得注意的是它是空白的。我们的自定义函数中间的unique函数只是确保我们按元素获取唯一匹配元素。

ID = seq(1,4)
Notes <- c(
  "dogs are friendly",
  "dogs and cats are pets",
  "cows live on farms",
  "cats and cows start with c "
)
df <- data.frame(ID, Notes)

animals = c("cats", "cows")

matches <- as.data.frame(sapply(animals, function(x){sapply(stringr::str_extract_all(df$Notes, x), unique)}, simplify = TRUE))
matches[matches == "character(0)"] <- NA

apply(matches, 1, paste, collapse = ", ")
[1] "NA, NA"     "cats, NA"   "NA, cows"   "cats, cows"

您可以将此设置为您的额外列，但由于这些新增功能并不好。如果有一个粘贴函数忽略了NAs，我们就会设置它。

幸运的是，另一位用户已经解决了这个问题:) Check out this answer here.

与上述相结合应该为您提供合适的解决方案！

Answer 2

我将如何做到这一点：

animals = c("cats", "cows")
reg = paste(animals, collapse = "|")

library(stringr)
matches = str_extract_all(Notes, reg)
matches = lapply(matches, unique)
matches = sapply(matches, paste, collapse = ",")

df$matches = matches
df
#   ID                       Notes   matches
# 1  1           dogs are friendly          
# 2  2      dogs and cats are pets      cats
# 3  3          cows live on farms      cows
# 4  4 cats and cows start with c  cats,cows

如果你想了解它，请在正则表达式上粘贴单词边界，例如reg = paste("\\b", animals, "\\b", collapse = "|")，以避免提取单词的中间部分。

使用LachlanO提供的数据：

ID = seq(1,4)
Notes <- c(
  "dogs are friendly",
  "dogs and cats are pets",
  "cows live on farms",
  "cats and cows start with c "
)
df <- data.frame(ID, Notes)

Answer 3

您可以使用gsub一次获得所有动物：

gsub(".*?(cows|cats )|.*","\\1",do.call(paste,df),perl = T)
[1] ""          "cats "     "cows"      "cats cows"

因此，写在一个通道：

transform(df,matches=gsub(".*?(cows|cats )|.*","\\1",do.call(paste,df),perl = T))
  ID                       Notes   matches
1  1           dogs are friendly          
2  2      dogs and cats are pets     cats 
3  3          cows live on farms      cows
4  4 cats and cows start with c  cats cows

如何将数据框与列表进行比较，并返回与列表匹配的数据框中的值？

3 个答案: