我是R编程的新手,并编写了一个删除停用词的程序
require(tm)
data<-read.csv('remm.corp')
print(data)
path<-"/home/cloudera/saicharan/R/text.txt"
aaa<-readLines(path)
bbb<-Corpus(VectorSource(aaa))
#inspect(bbb)
bbb<-tm_map(bbb,removeWords,stopwords("english"))
write.csv(as.character(bbb[[1]]),'e.csv')
我尝试将数据写入文件,但只能编写一行...我应该如何修改代码以打印多行?请帮忙
答案 0 :(得分:0)
保存语料库的一种方法是首先转换为数据帧,然后将其另存为csv文件。由于您没有提供示例文本,因此我创建了一些可重现的文本。下面的代码首先从示例文本创建语料库。然后删除停用词。语料库结构是一个列表,文本保存在内容元素中。代码只提取文本并创建数据框。最后我们保存数据框。
<强>代码:强>
#Reproducible data - Quotes from As You Like It by William Shakespeare
SampleText <- c("All the world's a stage,And all the men and women merely players;They have their exits and their entrances;And one man in his time plays many parts,
His acts being seven ages.",
"Men have died from time to time, and worms have eaten them, but not for love.",
"Love is merely a madness.")
library(tm)
mycorpus <- Corpus(VectorSource(SampleText)) # Corpus creation
mycorpus <-tm_map(mycorpus,removeWords,stopwords("english"))
mycorpus_dataframe <- data.frame(text=unlist(sapply(mycorpus, `[`, "content")),
stringsAsFactors=F)
write.csv(mycorpus_dataframe,'mycorpus_dataframe.csv', row.names=FALSE)
<强>输出:强>
> print(mycorpus_dataframe , row.names=FALSE)
text
All world's stage,And men women merely players;They exits entrances;And one man time plays many parts,\nHis acts seven ages.
Men died time time, worms eaten , love.
Love merely madness.
>