我正在从一个网站拉到Rstuidio的桌子 这是我的代码:
library(XML)
library(RCurl)
get_Name_code_nation<-function(k,olympicyear){
#this function gets a number representing index which will tell us what page to
#read(which olympic) and a year,and it returns the athletes table for that olympic.I will use a
#loop on this function to get a table of all olympics.
#first stage read the olympics page with all the sports in it
URL <- paste("http://www.databaseolympics.com/games/gamesyear.htm?g=",k,sep = "")
parsed.page <- htmlParse(getURL(URL))
URL.vec <- xpathSApply(parsed.page, "//a[starts-with(@href, '/games/')]",
xmlGetAttr, 'href')
# second stage define variables
athlete_id<-c()
tab<-data.frame()
for(i in 2:length(URL.vec)){
temp_URL<-paste('http://www.databaseolympics.com',URL.vec[i],sep="")
tab<-rbind(tab,readHTMLTable(temp_URL, which=3,colClasses =
list(NULL,'character','factor',NULL,NULL),stringsAsFactors = FALSE))
#part of the loop we get codes
parsed.page.codes <- htmlParse(getURL(temp_URL))
codes.vec<-xpathSApply(parsed.page.codes, "//a[starts-with(@href, '/players/')]",
xmlGetAttr, 'href')[2:length(xpathSApply(parsed.page.codes,
"//a[starts-with(@href, '/players/')]",xmlGetAttr, 'href'))]
athlete_id<-c(athlete_id,sub("[^=]*=", "", codes.vec))
}
tab<-data.frame(tab,athlete_id)
return(tab)
}
numbers<-c(1:26,47)
years<-c(1896,1900,1904,1906,1908,1912,1920,1924,1928,1932,1936,seq(1948,2008,4))
athletes.df<-data.frame()
for(i in 1:length(numbers)){
athletes.df<-rbind(athletes.df,get_Name_code_nation(numbers[i],years[i]))
}
我多次运行这个程序,而且大部分时间一切正常,代码运行没有问题,数据应该按原样进入。问题是,偶尔我会得到一个奇怪的错误,看起来程序在中间停止,只有一些数据进来 这些是我得到的不同错误:
Error in function (type, msg, asError = TRUE) :
Recv failure: Connection was reset
Error in function (type, msg, asError = TRUE) :
connection timed out 80
Error in UseMethod("xmlNamespaceDefinitions") :
no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL"
如果它从来没有用过,我会说代码有问题,但大部分时间它都有效,我不知道是什么影响它并导致它工作或不工作。
是否有人知道它是否与网站有关?我有什么可以做的吗?