我有以下示例数据帧“ df”,其变量“ Text”包含文本:
df:
Text
1 I like blue shoes.
2 Black is great!
3 Pink and grey books.
4 I don't like grey trousers.
5 Yellow is my favorite colour
6 No more green!
7 Cars are red.
8 I have a pink bike
我使用以下代码过滤包含至少一个列出的单词的所有情况,效果很好:
library(tidyverse)
library(igraph)
library(stringi)
library(stringr)
filter <- c("blue","green","yellow","red")
df2 <-
df %>%
filter(str_detect(tolower(df$Text), paste(filter, collapse = "|")))
df2:
Text
1 I like blue shoes.
5 Yellow is my favorite colour
6 No mor green!
7 Cars are red.
作为附加条件,我现在要添加“粉红色”和“灰色”的组合,以过滤上面列出的至少一个单词或组合。我想要的数据框如下所示:
df2:
Text
1 I like blue shoes.
3 Pink and grey books.
5 Yellow is my favorite colour
6 No mor green!
7 Cars are red.
您知道我如何到达那里吗? 预先感谢!
答案 0 :(得分:0)
您可以使用&
运算符来组合filter
运算(还有|
OR运算符)。
> f1
[1] "blue" "green" "yellow" "red"
> f2
[1] "pink" "grey"
> df
# A tibble: 4 x 2
Text1 Text2
<chr> <chr>
1 Yellow This
2 red That
3 Purple grey The
4 green pink other
> filter(df, str_detect(Text1, paste0(f1, collapse = "|")))
# A tibble: 2 x 2
Text1 Text2
<chr> <chr>
1 red That
2 green pink other
> filter(df,
str_detect(Text1, paste0(f1, collapse = "|")) &
str_detect(Text1, paste0(f2, collapse = "|")))
# A tibble: 1 x 2
Text1 Text2
<chr> <chr>
1 green pink other
请注意,第二个步骤需要两项操作。
> filter(df,
str_detect(Text1, paste0(f1, collapse = "|")) |
(str_detect(Text1, "pink") & str_detect(Text1, "grey")))
您仍然可以使用&或|运算符和方括号一起获得所需的逻辑组合。