Question

使用tidytext，我有这段代码：

data(stop_words)
tidy_documents <- tidy_documents %>%
      anti_join(stop_words)

我希望它使用包中内置的停用词将名为tidy_documents的数据帧写入同名的数据框中，但如果它们位于stop_words中，则删除单词。

我收到此错误：

错误：没有常见变量。请指定by param。回溯：

1. tidy_documents %>% anti_join(stop_words)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(expr, envir, enclos)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. anti_join(., stop_words)
10. anti_join.tbl_df(., stop_words)
11. common_by(by, x, y)
12. stop("No common variables. Please specify `by` param.", call. = FALSE)

Answer 1

您可以使用更简单的filter()来避免使用这样令人困惑的anti_join()函数：

tidy_documents <- tidy_documents %>%
  filter(!word %in% stop_words$word)

Answer 2

tidy_document和stop_words都有一个名为word的列下列出的单词列表;但是，列被反转：在stop_words中，它是第一列，而在数据集中，它是第二列。这就是为什么命令无法匹配＆＃34;两列并比较单词。试试这个：

tidy_document <- tidy_document %>% 
      anti_join(stop_words, by = c("word" = "word"))

by命令强制脚本比较名为word的列，无论其位置如何。

用tidytext删除停用词

2 个答案: