我正在对数千个文档进行文本挖掘(基本上是在进行频率计数),并且想知道是否还有其他方法可以加快以下过程?目前,运行整个分析需要10个多小时。谢谢(来自R初学者)。
sessionInfo()
#R version 3.2.3 (2015-12-10)
library(bitops)
library(RCurl)
library(XML)
library(stringr)
library(tm)
setwd("F:/testing_folder")
path = "F:/testing_folder"
file.names <- dir(path, pattern =".txt")
filename <- vector()
totalword <- vector()
system.time(
for(i in 1:length(file.names)){
text.v <- scan(file.names[i], what="character", sep="\n",encoding = "UTF-8")
report.v <- paste(text.v, collapse=" " )
#Count total number of words
words.l <- strsplit(report.v, "\\W")
word.v <- unlist(words.l)
not.blanks.v <- which(word.v!="")
word.v <- word.v[not.blanks.v]
totalword <- append(totalword,length(word.v))
filename <- append(filename,print(file.names[i]))
x <- data.frame(filename,totalword)
write.csv(x, file= "results.csv") #export results
}
)
答案 0 :(得分:0)
你从以下内容得到什么?
Rprof("profile1.out", line.profiling=TRUE)
source("http://pastebin.com/raw/kFGCse5s")
Rprof(NULL)
proftable("profile1.out", lines=10)