我试图使用XML,RCurl或httr库在R中搜索以下网页: http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB
网页在我的浏览器中正确打开。以下是我试图抓取网页:
library("XML")
#this works fine (QB projections)
qb <- readHTMLTable("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/", header=1)$fantasy_table
#this does not (RB projections)
rb <- readHTMLTable("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB", header=1)$fantasy_table
library("RCurl")
htmlParse("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB")
library("httr")
GET("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB")
我使用readHTMLTable和htmlParse收到以下错误:&#34;错误:无法加载HTTP资源&#34;。使用GET,我收到状态代码404,表示无法找到资源,并且我发送请求的方式可能存在错误。鉴于我可以在浏览器中打开网页,我不确定问题是什么。也许它是一种不同于函数所期望的文件?有什么想法吗?
理想情况下,刮擦将适用于所有146个条目(不仅仅是前25个)。
答案 0 :(得分:1)
使用RCurl
require(RCurl)
readHTMLTable(getURL("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB"), header = 1)
> head(readHTMLTable(getURL("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB"), header = 1)$fantasy)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 DeMarco Murray DAL 20.3 17.3 106 6.13 0.85 5.4 39 0.19 0.2
2 Jamaal Charles KC 18.5 18.4 70 3.8 0.4 6.7 59 0.6 0.23
3 LeSean McCoy PHI 17.8 22.2 102 4.59 0.81 2.7 24 0.13 0.22
4 Le'Veon Bell PIT 17.1 25.1 95 3.78 0.65 3.5 30 0.2 0.26
5 Danny Woodhead SD 16.6 9.5 47 4.95 0.27 5.7 60 0.76 0.14
6 Marshawn Lynch SEA 15.8 18.6 79 4.25 0.85 3.1 24 0.12 0.19