在R
中有2个不同的数据框A - 数据集具有以下数据
cat
dog
Rat
Parrot
Tiger
B - 数据集具有以下数据
Give milk to cat
dog bites
life span of dog is 10 years
Cow gives us milk
Tiger have huge Jaws
现在,R代码必须检查数据集A中每个值的整个B数据。
答案 0 :(得分:1)
选项是使用apply
并查找df_A
中的每个字词df_B
。 OP
未明确指定预期格式。找到的df_A
中的单词可以在最终输出中使用unlist
和unique
列出。
library(dplyr)
apply(df_B,1, function(x){
df_A$Word[(df_A$Word %in% strsplit(x, split=" ")[[1]])]
}) %>% unlist() %>% unique()
#[1] "cat" "dog" "Tiger"
#If objective is to find which row in B contains at least a word from df_A then:
df_B$Have_A <- mapply(function(x){
any(df_A$Word %in% strsplit(x, split=" ")[[1]])
}, df_B$Text)
df_B
# Text Have_A
# 1 Give milk to cat TRUE
# 2 dog bites TRUE
# 3 life span of dog is 10 years TRUE
# 4 Cow gives us milk FALSE
# 5 Cow have huge advantages TRUE
数据:
df_B <- read.table(text =
"Text
'Give milk to cat'
'dog bites'
'life span of dog is 10 years'
'Cow gives us milk'
'Tiger have huge Jaws'",
header = TRUE, stringsAsFactors = FALSE)
df_A <- read.table(text =
"Word
cat
dog
Rat
Parrot
Tiger",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
我们可以paste
&#39; A&#39;中的列的元素数据集并将其用作pattern
中的grepl
,以通过检查&#39; B&#39;中的字符串来获取逻辑向量。数据集列
i1 <- grepl(paste0("\\b(", paste(A$col, collapse="|"), ")\\b"),
B$col, ignore.case = TRUE)
i1
#[1] TRUE TRUE TRUE FALSE TRUE
B$col[i1]
A <- structure(list(col = c("cat", "dog", "Rat", "Parrot", "Tiger"
)), .Names = "col", class = "data.frame", row.names = c(NA, -5L
))
B <- structure(list(col = c("Give milk to cat", "dog bites",
"life span of dog is 10 years",
"Cow gives us milk", "Tiger have huge Jaws")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))