Question

嗨，我是R的新手，我正在建立一个来自网络的两个指南，我想出了如何自动化数据挖掘的脚本，但是每次代码运行时都会过度写入数据。我想附上它可以任何一个指向我正确的方向。

这是脚本本身

# loading the package is required once each session
require(XML)

# initialize a storage variable for Twitter tweets
mydata.vectors <- character(0)

# paginate to get more tweets
for (page in c(1:15))
{
    # search parameter
    twitter_q <- URLencode('#google OR #apple')
    # construct a URL
    twitter_url = paste('http://search.twitter.com/search.atom?q=',twitter_q,'&rpp=100&page=', page, sep='')
    # fetch remote URL and parse
    mydata.xml <- xmlParseDoc(twitter_url, asText=F)
    # extract the titles
    mydata.vector <- xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue, namespaces =c('s'='http://www.w3.org/2005/Atom'))
    # aggregate new tweets with previous tweets
    mydata.vectors <- c(mydata.vector, mydata.vectors)
}

# how many tweets did we get?
length(mydata.vectors)

Answer 1

我认为你想要的是在运行之间将结果保存到磁盘。所以，一开始就是这样的事情：

if (!file.exists('path/to/file'))
    mydata.vectors <- character(0)
else
    load('path/to/file')

最后这样的事情：

save(mydata.vectors, file='path/to/file')

应该做的伎俩。当然，你可以通过保存文件类型等来获得更复杂的功能。

简单的R项目

1 个答案: