我尝试添加一个新列keywords
,如果该词出现在关键字列表中,则会获得值TRUE
。如果FALSE
中没有出现该字,则该值为keywordslist
。我的关键字包含100多个单词,因此无法手动添加单词。
的keywordlist(样品):
thank
impressed
this
我有一个值为id
和word
的数据框,我已将这些字段取消并按ID分组:
id word
1234 thank
1234 you
1234 very
1234 much
1567 i
1567 am
1567 not
1567 impressed
9654 what
9654 is
9654 this
我希望结果如下:
id word keywords
1234 thank TRUE
1234 you FALSE
1234 very FALSE
1234 much FALSE
1567 i FALSE
1567 am FALSE
1567 not FALSE
1567 impressed TRUE
9654 what FALSE
9654 is FALSE
9654 this TRUE
我尝试过的代码如下: 1.:
df <- df %>%
group_by(id) %>%
mutate(keywords = ifelse(
word == rowwise(keywordslist), TRUE, FALSE)
代码#1引发下一个错误:
mutate_impl(.data,dots)出错:评估错误: is.data.frame(data)不为TRUE。
我用grepl尝试了一个不同的变体:
df <- df %>%
group_by(id) %>%
mutate(keywords = ifelse(
word == rowwise(grepl(keywordslist, word)), TRUE,FALSE)
这引发了以下错误:
mutate_impl(.data,dots)出错:评估错误: is.data.frame(data)不为TRUE。另外:警告信息:在 grepl(keywordslist,keywords):argument&#39; pattern&#39;长度> 1 并且只使用第一个元素
我不确定这是否是解决这种情况的正确方法。欢迎任何帮助。
答案 0 :(得分:3)
df$keywords <- df$word %in% keywordslist
应该这样做
答案 1 :(得分:0)
您可以执行以下操作:
library(dplyr)
df1 %>%
mutate(keywords = word %in% keywordlist)
# id word keywords
#1 1234 thank TRUE
#2 1234 you FALSE
#3 1234 very FALSE
#4 1234 much FALSE
#5 1567 i FALSE
#6 1567 am FALSE
#7 1567 not FALSE
#8 1567 impressed TRUE
#9 9654 what FALSE
#10 9654 is FALSE
#11 9654 this TRUE
或与base
R
df1$keywords <- sapply(df1, function(x) x %in% keywordlist)[,2]
# id word keywords
#1 1234 thank TRUE
#2 1234 you FALSE
#3 1234 very FALSE
#4 1234 much FALSE
#5 1567 i FALSE
#6 1567 am FALSE
#7 1567 not FALSE
#8 1567 impressed TRUE
#9 9654 what FALSE
#10 9654 is FALSE
#11 9654 this TRUE
答案 2 :(得分:0)
dplyr
方法可能
library(dplyr)
df %>%
mutate(keywords = grepl(paste(keywordlist, collapse = "|"), word))
给出了
id word keywords
1 1234 thank TRUE
2 1234 you FALSE
3 1234 very FALSE
4 1234 much FALSE
5 1567 i FALSE
6 1567 am FALSE
7 1567 not FALSE
8 1567 impressed TRUE
9 9654 what FALSE
10 9654 is FALSE
11 9654 this TRUE
示例数据:
df <- structure(list(id = c(1234L, 1234L, 1234L, 1234L, 1567L, 1567L,
1567L, 1567L, 9654L, 9654L, 9654L), word = c("thank", "you",
"very", "much", "i", "am", "not", "impressed", "what", "is",
"this")), .Names = c("id", "word"), class = "data.frame", row.names = c(NA,
-11L))
keywordlist <- c("thank", "impressed", "this")