如果string包含list中的元素,则创建新的条件列

时间:2018-06-15 07:11:57

标签: r list dplyr mutate

我尝试添加一个新列keywords,如果该词出现在关键字列表中,则会获得值TRUE。如果FALSE中没有出现该字,则该值为keywordslist。我的关键字包含100多个单词,因此无法手动添加单词。

的keywordlist(样品):

thank
impressed
this

我有一个值为idword的数据框,我已将这些字段取消并按ID分组:

id      word
1234    thank
1234    you
1234    very
1234    much
1567    i
1567    am
1567    not
1567    impressed
9654    what
9654    is
9654    this

我希望结果如下:

id      word       keywords
1234    thank      TRUE
1234    you        FALSE
1234    very       FALSE
1234    much       FALSE
1567    i          FALSE
1567    am         FALSE
1567    not        FALSE
1567    impressed  TRUE
9654    what       FALSE
9654    is         FALSE
9654    this       TRUE

我尝试过的代码如下: 1.:

df <- df %>%
  group_by(id) %>%
  mutate(keywords = ifelse(
  word == rowwise(keywordslist), TRUE, FALSE)

代码#1引发下一个错误:

  

mutate_impl(.data,dots)出错:评估错误:   is.data.frame(data)不为TRUE。

  1. 我用grepl尝试了一个不同的变体:

    df <- df %>% group_by(id) %>% mutate(keywords = ifelse( word == rowwise(grepl(keywordslist, word)), TRUE,FALSE)

  2. 这引发了以下错误:

      

    mutate_impl(.data,dots)出错:评估错误:   is.data.frame(data)不为TRUE。另外:警告信息:在   grepl(keywordslist,keywords):argument&#39; pattern&#39;长度> 1   并且只使用第一个元素

    我不确定这是否是解决这种情况的正确方法。欢迎任何帮助。

3 个答案:

答案 0 :(得分:3)

df$keywords <- df$word %in% keywordslist

应该这样做

答案 1 :(得分:0)

您可以执行以下操作:

library(dplyr)

 df1 %>% 
  mutate(keywords = word %in% keywordlist)

#  id      word keywords
#1  1234     thank     TRUE
#2  1234       you    FALSE
#3  1234      very    FALSE
#4  1234      much    FALSE
#5  1567         i    FALSE
#6  1567        am    FALSE
#7  1567       not    FALSE
#8  1567 impressed     TRUE
#9  9654      what    FALSE
#10 9654        is    FALSE
#11 9654      this     TRUE

或与base R

一起使用
df1$keywords <- sapply(df1, function(x) x %in% keywordlist)[,2]


#   id      word keywords
#1  1234     thank     TRUE
#2  1234       you    FALSE
#3  1234      very    FALSE
#4  1234      much    FALSE
#5  1567         i    FALSE
#6  1567        am    FALSE
#7  1567       not    FALSE
#8  1567 impressed     TRUE
#9  9654      what    FALSE
#10 9654        is    FALSE
#11 9654      this     TRUE

答案 2 :(得分:0)

dplyr方法可能

library(dplyr)

df %>%
  mutate(keywords = grepl(paste(keywordlist, collapse = "|"), word))

给出了

     id      word keywords
1  1234     thank     TRUE
2  1234       you    FALSE
3  1234      very    FALSE
4  1234      much    FALSE
5  1567         i    FALSE
6  1567        am    FALSE
7  1567       not    FALSE
8  1567 impressed     TRUE
9  9654      what    FALSE
10 9654        is    FALSE
11 9654      this     TRUE


示例数据:

df <- structure(list(id = c(1234L, 1234L, 1234L, 1234L, 1567L, 1567L, 
1567L, 1567L, 9654L, 9654L, 9654L), word = c("thank", "you", 
"very", "much", "i", "am", "not", "impressed", "what", "is", 
"this")), .Names = c("id", "word"), class = "data.frame", row.names = c(NA, 
-11L))

keywordlist <- c("thank", "impressed", "this")