R URL编码问题?

时间:2014-02-12 14:33:30

标签: r urlencode

我试图使用RCurl软件包从网站上获取数据表。我的代码可以通过点击网站成功获取您获得的URL:

http://statsheet.com/mcb/teams/air-force/game_stats/

一旦你尝试选择前几年(我想要的);我的代码不再有效。

示例链接: http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013

我猜这与年份特定地址中的保留符号有关。我已经尝试过URLencode以及手动编码地址,但也没有。

我的代码:

library(RCurl)
library(XML)

#Define URL
theurl <-URLencode("http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-    
2013", reserved=TRUE)

webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)

pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)

# Extract table header and contents
tablehead <- xpathSApply(pagetree, "//*/table[1]/thead[1]/tr[2]/th", xmlValue)
results <- xpathSApply(pagetree,"//*/table[1]/tbody/tr/td", xmlValue)

content <- as.data.frame(matrix(results, ncol = 19, byrow = TRUE))

testtablehead <- c("W/L","Opponent",tablehead[c(2:18)])
names(content) <- testtablehead

R返回的相关错误:

Error in function (type, msg, asError = TRUE)  : 
Could not resolve host: http%3a%2f%2fstatsheet.com%2fmcb%2fteams%2fair-  
force%2fgame_stats%3fseason%3d2012-2013; No data record of requested type

有谁知道问题是什么以及如何解决?

1 个答案:

答案 0 :(得分:1)

跳过不需要的编码并下载网址:

library(XML)
url <- "http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013"

pagetree <- htmlTreeParse(url, useInternalNodes = TRUE)