语料库输出一些结构功能词

时间:2016-01-28 12:02:49

标签: r tm corpus

使用TM库,语料库包含来自Vector源结构的单词:

text <- readLines("some.txt")

finalCorpus <- Corpus(VectorSource(newCorpus))
finalCorpus <- tm_map(finalCorpus, stripWhitespace)
save(finalCorpus, file="data/DEBUG.Rda")# DEBUG
df<- data.frame(lapply(finalCorpus, as.character), stringsAsFactors=FALSE)
df
>protracted periods meditation fasting prayer ennui fever energy vigor
>married joseph lee dollars million canadian dollars gbp pastored african
>american church snow hill jersey children died infancy **meta list author
>character datetimestamp list sec min hour mday mon year wday yday isdst
>description character heading character id language en origin character
>X2   X3
>1 list list**

**之间的单词来自语料库,而不是来自导入的文本,为什么我得到它们以及如何删除它们(没有removeWords TM函数)?

0 个答案:

没有答案