Question

我使用了以下R脚本：

url="http://stats.espncricinfo.com/ci/engine/player/253802.html?class=3;orderby=default;template=results;type=batting"
check=readHTMLTable(url,header = T)
check$"Career summary"
check<-check$"Career summary"

我只能抓住前11个观察结果。

有谁能说明为什么我不能刮掉整张桌子？

Answer 1

获取页面上所有表格的内容：

library(XML)

url="http://stats.espncricinfo.com/ci/engine/player/253802.html?class=3;orderby=default;template=results;type=batting"

content <- htmlParse(url)

tbody <- xpathSApply(content, "//tbody")

lapply(tbody, function(x) readHTMLTable(x, header=T))

Answer 2

AS @ Wietze314表示该页面上有多个表格。您可以获得我认为您感兴趣的所有表的列表：

url="http://stats.espncricinfo.com/ci/engine/player/253802.html?class=3;
orderby=default;template=results;type=batting"

check=htmlParse(url)    

tableNodes <- getNodeSet(check, '//tbody')
tbList <- lapply(tableNodes, readHTMLTable)

tbList包含22个data.frames供您使用

无法使用R完全刮取HTML表

2 个答案: