如何按标准操作操作字符向量?
(需要一个DTM字典(分类)。所以为了匹配文本条目,已经进行了这些操作,我必须适当地改变我的字典术语。)
library(tm)
dicBin <- c("rosa", "rosig", "grün ", "Blau", "gelb", "lila", "orange", "pink", "%", "mm", "mp", "*", "monat")
dicBin.corp <- tm_map(dicBin.corp, stemDocument, language = "german") # Initially I hoped that tm_map would work on a vector. Since it doesn't I tried to transform it to Corpus
dicBin.corp <- tm_map(dicBin.corp, stripWhitespace)
dicBin.corp <- tm_map(dicBin.corp, tolower)
此处dicBin.corp
内部只有"%"
已编辑
## transform back to a vector
dicBin <- dicBin.corp # How to do also this properly?
答案 0 :(得分:2)
尝试将函数直接应用于角色向量:
stemDocument(dicBin, language="german")
stripWhitespace(dicBin)
tolower(dicBin)
要将语料库转换回字符向量,请尝试
as.character(dicBin.corp)
# [1] "rosa" "rosig" "grun" "blau" "gelb" "lila" "orang"
# [8] "pink" "%" "mm" "mp" "*" "monat"