如何将语料库数据写入r程序中的文件以删除停用词

时间:2016-03-22 11:59:56

标签: r

我是R编程的新手,并编写了一个删除停用词的程序

require(tm)
data<-read.csv('remm.corp')
print(data)

path<-"/home/cloudera/saicharan/R/text.txt"
aaa<-readLines(path)

bbb<-Corpus(VectorSource(aaa))
#inspect(bbb)

bbb<-tm_map(bbb,removeWords,stopwords("english"))
write.csv(as.character(bbb[[1]]),'e.csv')

我尝试将数据写入文件,但只能编写一行...我应该如何修改代码以打印多行?请帮忙

1 个答案:

答案 0 :(得分:0)

保存语料库的一种方法是首先转换为数据帧,然后将其另存为csv文件。由于您没有提供示例文本,因此我创建了一些可重现的文本。下面的代码首先从示例文本创建语料库。然后删除停用词。语料库结构是一个列表,文本保存在内容元素中。代码只提取文本并创建数据框。最后我们保存数据框。

<强>代码:

#Reproducible data - Quotes from  As You Like It by  William Shakespeare
SampleText <- c("All the world's a stage,And all the men and women merely players;They have their exits and their entrances;And one man in his time plays many parts,
His acts being seven ages.",
          "Men have died from time to time, and worms have eaten them, but not for love.",
          "Love is merely a madness.")

library(tm)
mycorpus <-  Corpus(VectorSource(SampleText)) # Corpus creation
mycorpus <-tm_map(mycorpus,removeWords,stopwords("english"))

mycorpus_dataframe <- data.frame(text=unlist(sapply(mycorpus, `[`, "content")), 
                      stringsAsFactors=F)

write.csv(mycorpus_dataframe,'mycorpus_dataframe.csv', row.names=FALSE)

<强>输出:

> print(mycorpus_dataframe , row.names=FALSE)
                                                                                                                                     text
 All  world's  stage,And   men  women merely players;They   exits   entrances;And one man   time plays many parts,\nHis acts  seven ages.
                                                                                          Men  died  time  time,  worms  eaten ,    love.
                                                                                                                   Love  merely  madness.

>