我正在导入一个带有西班牙语单词的txt文件,因为我想创建wordCloud ......
问题是我在wordcloud中没有重音标记这句话......
有些词语如下:“México”显示为“mc3a9xico”???
text <- readLines(file.choose())
# Load the data as a corpus
docs <- Corpus(VectorSource(text))
# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
# docs <- tm_map(docs, stemDocument)
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
set.seed(1234)
#Generate WordCloud
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
答案 0 :(得分:1)
问题是我没有设置我的系统区域设置。因此,在尝试多次更改为西班牙语之后,我收到了此错误:&#34;操作系统报告请求将语言环境设置为&#34; sp_MX.UTF-8&#34;不能被尊重&#34;所以我最终使用了这个:
Sys.setlocale(category = "LC_ALL", locale = "en_US.UTF-8")
之后一切正常。
感谢@hrbrmstr他向我指出了实际问题:)