我正在研究R中的文本挖掘,在删除标点符号,数字,URL和停用词后,我的语料库中的文档很少。
myStopwords <- setdiff(myStopwords, c("r", "big"))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
myCorpus <- tm_map(myCorpus, stripWhitespace)
myCorpusCopy <- myCorpus
for (i in c(1:2, 320))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(myCorpus[[i]]), 60))
}
[1] examples calling java code r
[2] simulating mapreduce r big data analysis using flights data
rbloggers
[320] r reference card data mining now cran lists many useful r
functions packages data mining applications
之后,我正在尝试如下所述,
myCorpus <- tm_map(myCorpus, stemDocument)
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)
当我尝试运行for
循环时,它显示NA
,如下所示
for (i in c(1:2, 320))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(myCorpus[[i]]), 60))
}
[1] NA
[2] NA
[320] NA
知道我哪里错了吗?
答案 0 :(得分:0)
我使用内置数据集重现了您的问题:
data("crude")
myCorpus <- as.VCorpus(crude)
myCorpusCopy <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=myCorpusCopy)
我发现在最后一行之后,myCorpus
对象的元素在其结构中有更多字段,例如meta
和content
在我的情况下,现在元素被命名为字符向量。
您仍然可以访问元素:
myCorpus[[1]]
Diamond Shamrock Corp said that\neffect today it had cut it contract price for crude oil by\n1.50 dlrs a barrel.\n The reduct bring it post price for West Texas\nIntermedi to 16.00 dlrs a barrel, the copani said.\n "The price reduct today was made in the light of falling\noil product price and a weak crude oil market," a company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil compani that\nhav cut it contract, or posted, price over the last two days\ncit weak oil markets.\n Reuter
"content"
<NA>
"meta"
但是as.character()
方法正在触及对象元素的新结构(str()
)的相反部分,而不是你想要的。现在正文实际上存储为names
。
我能够像这样修复循环:
for (i in c(1:2, length(myCorpus)))
{
cat(paste0("[", i, "] "))
writeLines(strwrap(as.character(names(myCorpus[[i]])), 60))
}
[1] Diamond Shamrock Corp said that effect today it had cut it contract price for crude oil by 1.50 dlrs a barrel. The reduct bring it post price for West Texas Intermedi to 16.00 dlrs a barrel, the copani said. "The price reduct today was made in the light of falling oil product price and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil compani that hav cut it contract, or posted, price over the last two days cit weak oil markets. Reuter [2] OPEC may be forc to meet befor a schedul June session to readdress it product cutting agr if the organ want to halt the current slide in oil prices, oil industri analyst said. "The movement to higher oil price was never to be as easy a OPEC thought. They may need an emerg meet to sort out th problems," said Daniel Yergin, director of Cambridg Energy Research Associates, CERA. Analyst and oil industri sourc said the problem OPEC face is excess oil suppli in world oil markets. "OPEC problem is not a price problem but a production issu and must be address in that way," said Paul Mlotok, oil analyst with Salomon Brother Inc. He said the market earlier optim about OPEC and its abl to keep product under control have given way to a pessimist outlook that the organ must address soon if it wish to regain the initi in oil prices. But some other analyst were uncertain that even an emerg meet would address the problem of OPEC production abov the 15.8 mln bpd quota set last December. "OPEC has to learn that in a buyer market you cannot have deem quotas, fix price and set differentials," said the region manag for one of the major oil compani who spoke on condit that he not be named. "The market is now tri to teach them that lesson again," he added. David T. Mizrahi, editor of Mideast reports, expect OPEC to meet befor June, although not immediately. However, he is not optimist that OPEC can address it princip problems. "They will not meet now as they tri to take advantag of the wint demand to sell their oil, but in late March and April when demand slackens," Mizrahi said. But Mizrahi said that OPEC is unlik to do anyth more than reiter it agreement to keep output at 15.8 mln bpd." Analyst said that the next two month will be critic for OPEC abil to hold togeth price and output. "OPEC must hold to it pact for the next six to eight weeks sinc buyer will come back into the market then," said Dillard Sprigg of Petroleum Analysi Ltd in New York. But Bijan Moussavar-Rahmani of Harvard Univers Energy and Environ Polici Center said that the demand for OPEC oil ha been rise through the first quarter and this may have prompt excess in it production. "Demand for their (OPEC) oil is clear abov 15.8 mln bpd and is probabl closer to 17 mln bpd or higher now so what we ar see character as cheat is OPEC meet this demand through current production," he told Reuter in a telephon interview. Reuter [20] Argentin crude oil product was down 10.8 pct in Januari 1987 to 12.32 mln barrels, from 13.81 mln barrel in Januari 1986, Yacimiento Petrolifero Fiscales said. Januari 1987 natur gas output total 1.15 billion cubic metrers, 3.6 pct higher than 1.11 billion cubic metr produced in Januari 1986, Yacimiento Petrolifero Fiscal added. Reuter