总新手R问题。我有一个ID和Notes的数据框df:
ID Notes
1 dogs are friendly
2 dogs and cats are pets
3 cows live on farms
4 cats and cows start with c
我有另一个价值观清单"动物"
cats
cows
我想添加另一列"匹配"到包含Notes中所有动物的数据框,例如
ID Notes Matches
1 dogs are friendly
2 dogs and cats are pets cats
3 cows live on farms cows
4 cats and cows start with c cats, cows
到目前为止,我唯一的运气是使用grepl返回,如果有任何匹配:
grepl(paste(animals,collapse="|"),df$Notes,ignore.case = T)
如何返回值?
更新
我的数据框中有一些行,我有多个猫实例,例如,在我的笔记中:
ID Notes Matches
1 dogs are friendly
2 dogs and cats are pets cats
3 cows live on farms cows
4 cats and cats cows start with c cats, cows
我只想返回一个匹配的实例。 @LachlanO让我非常接近他的解决方案,但我得到了:
[1] "NA, NA" "cats, NA" "NA, cows" "c(\"cats\", \"cats\"), cows"
我怎样才能返回不同的匹配?
答案 0 :(得分:1)
编辑:添加了unique
操作来处理重复的匹配。
我可以启动你,然后指出你的方向:)
下面使用stringr :: str_extract_all来提取我们需要的相关位,但遗憾的是,它们给我们留下了一些我们不知道的位,最值得注意的是它是空白的。我们的自定义函数中间的unique
函数只是确保我们按元素获取唯一匹配元素。
ID = seq(1,4)
Notes <- c(
"dogs are friendly",
"dogs and cats are pets",
"cows live on farms",
"cats and cows start with c "
)
df <- data.frame(ID, Notes)
animals = c("cats", "cows")
matches <- as.data.frame(sapply(animals, function(x){sapply(stringr::str_extract_all(df$Notes, x), unique)}, simplify = TRUE))
matches[matches == "character(0)"] <- NA
apply(matches, 1, paste, collapse = ", ")
[1] "NA, NA" "cats, NA" "NA, cows" "cats, cows"
您可以将此设置为您的额外列,但由于这些新增功能并不好。如果有一个粘贴函数忽略了NAs,我们就会设置它。
幸运的是,另一位用户已经解决了这个问题:) Check out this answer here.
与上述相结合应该为您提供合适的解决方案!
答案 1 :(得分:0)
我将如何做到这一点:
animals = c("cats", "cows")
reg = paste(animals, collapse = "|")
library(stringr)
matches = str_extract_all(Notes, reg)
matches = lapply(matches, unique)
matches = sapply(matches, paste, collapse = ",")
df$matches = matches
df
# ID Notes matches
# 1 1 dogs are friendly
# 2 2 dogs and cats are pets cats
# 3 3 cows live on farms cows
# 4 4 cats and cows start with c cats,cows
如果你想了解它,请在正则表达式上粘贴单词边界,例如reg = paste("\\b", animals, "\\b", collapse = "|")
,以避免提取单词的中间部分。
使用LachlanO提供的数据:
ID = seq(1,4)
Notes <- c(
"dogs are friendly",
"dogs and cats are pets",
"cows live on farms",
"cats and cows start with c "
)
df <- data.frame(ID, Notes)
答案 2 :(得分:0)
您可以使用gsub
一次获得所有动物:
gsub(".*?(cows|cats )|.*","\\1",do.call(paste,df),perl = T)
[1] "" "cats " "cows" "cats cows"
因此,写在一个通道:
transform(df,matches=gsub(".*?(cows|cats )|.*","\\1",do.call(paste,df),perl = T))
ID Notes matches
1 1 dogs are friendly
2 2 dogs and cats are pets cats
3 3 cows live on farms cows
4 4 cats and cows start with c cats cows