使用R从网页刮取特定表

时间:2013-11-05 19:08:31

标签: r xml web-scraping

我需要从以下网址中提取表格: http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2013;type=year

我只需要带有“匹配结果”标题的表格

我使用了以下代码: 库(XML) ODItable< - readHTMLTable('http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2013;type=year')

如何从此处提取特定表格?

1 个答案:

答案 0 :(得分:0)

你几乎拥有它!

library(XML) 
url <- 'http://stats.espncricinfo.com/ci/engine/records/team/match_results.html?class=2;id=2013;type=year'
ODItable <- readHTMLTable(url)
head(ODItable$`Match results`)

     Team 1    Team 2    Winner    Margin    Ground   Match Date  Scorecard
1     India  Pakistan  Pakistan   85 runs   Kolkata  Jan 3, 2013 ODI # 3315
2     India  Pakistan     India   10 runs     Delhi  Jan 6, 2013 ODI # 3316
3 Australia Sri Lanka Australia  107 runs Melbourne Jan 11, 2013 ODI # 3317
4     India   England   England    9 runs    Rajkot Jan 11, 2013 ODI # 3318
5 Australia Sri Lanka Sri Lanka 8 wickets  Adelaide Jan 13, 2013 ODI # 3319
6     India   England     India  127 runs     Kochi Jan 15, 2013 ODI # 3320