我有这个代码块来获取训练数据集中的单词数,并根据找到的文章编号对其进行索引。 这是我正在使用的数据帧,一旦代码运行,文档列中的每个单词都将放置在其自己的单元格中,并为其从中索引到文档中(最左列中的数字。
dput(droplevels(head(TrainData)))
structure(list(Class = c("atheism", "atheism", "atheism", "atheism",
"atheism", "atheism"), Document = c(" atheism faq atheist resources ",
" atheism faq introduction to atheism archive name atheism introduction ",
"gospel dating in article mimsy umd edu mangoe cs umd edu charley ",
"university violating separation of church state dmn kepler unh edu ",
"soc motss et al princeton axes matching funds for boy scouts in article ",
"a visit from the jehovah s witnesses in article apr batman bmd trw com"
), Index = 1:6), row.names = c(NA, 6L), class = "data.frame")
用于分隔每个单词并对其进行索引的代码块是:
library(tidytext)
TrainData$Index = 1:dim(TrainData)[1] # provides an index of what articles are connected with each word
TrainData_words <- TrainData %>% # add the original row numbers as index
unnest_tokens(word,Document) %>%
filter(str_detect(word, "[a-z']$"),
!word %in% stop_words$word) # removing the stop-words (actual words that do not impact text mining or correlation)
运行正常,现在出现以下错误: “ match(x,table,nomatch = 0L)中的错误:'match'需要向量参数”
运行traceback()会产生:
> traceback()
12: word %in% stop_words$word
11: match.arg(method)
10: filter(., str_detect(word, "[a-z']$"), !word %in% stop_words$word)
9: filter(., str_detect(word, "[a-z']$"), !word %in% stop_words$word)
8: function_list[[k]](value)
7: withVisible(function_list[[k]](value))
6: freduce(value, `_function_list`)
5: `_fseq`(`_lhs`)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: eval(quote(`_fseq`(`_lhs`)), env, env)
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1: TrainData %>% unnest_tokens(word, Document) %>% filter(str_detect(word,
"[a-z']$"), !word %in% stop_words$word)
我已经阅读了有关此特定错误消息的帖子,a =但仍然无法确定可能已更改或省略了哪些内容? 谢谢大家。