使用tm_map(...,tolower)将文本转换为小写时出错

时间:2012-11-30 06:35:41

标签: r tm lowercase term-document-matrix

我尝试使用tm_map。它给出了以下错误。我怎么能绕过这个?

 require(tm)
 byword<-tm_map(byword, tolower)

Error in UseMethod("tm_map", x) : 
  no applicable method for 'tm_map' applied to an object of class "character"

4 个答案:

答案 0 :(得分:101)

使用基本R函数tolower()

tolower(c("THE quick BROWN fox"))
# [1] "the quick brown fox"

答案 1 :(得分:6)

在此处将我的comment扩展为更详细的答案:您必须将tolower包裹在content_transformer内,而不是搞砸VCorpus对象 - 例如:< / p>

> library(tm)
> data('crude')
> crude[[1]]$content
[1] "Diamond Shamrock Corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n    The reduction brings its posted price for West Texas\nIntermediate to 16.00 dlrs a barrel, the copany said.\n    \"The price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n    Diamond is the latest in a line of U.S. oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n Reuter"
> tm_map(crude, content_transformer(tolower))[[1]]$content
[1] "diamond shamrock corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n    the reduction brings its posted price for west texas\nintermediate to 16.00 dlrs a barrel, the copany said.\n    \"the price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n    diamond is the latest in a line of u.s. oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n reuter"

答案 2 :(得分:3)

myCorpus <- Corpus(VectorSource(byword))
myCorpus <- tm_map(myCorpus , tolower)

print(myCorpus[[1]])

答案 3 :(得分:1)

以这种方式使用tolower会产生不良副作用:如果您稍后尝试从语料库中创建术语文档矩阵,则会失败。这是因为最近的tm变化无法处理tolower的返回类型。相反,使用:

myCorpus <- tm_map(myCorpus, PlainTextDocument)