rvest函数html_nodes返回{xml_nodeset(0)}

时间:2018-07-07 04:03:26

标签: r xpath web-scraping css-selectors rvest

我正在尝试在以下网站上抓取数据框

http://stats.nba.com/game/0041700404/playbyplay/

我想创建一个表格,其中包含比赛日期,整个比赛的得分以及球队名称

我正在使用以下代码:

game1 <- read_html("http://stats.nba.com/game/0041700404/playbyplay/")

#Extracts the Date
html_nodes(game1, xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "game-summary-team--vtm", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "game-summary-team__lineup", " " ))]')

#Extracts the Score
html_nodes(game1, xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "status", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "score", " " ))]')

#Extracts the Team names
html_nodes(game1, xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "game-summary-team__name", " " ))]//a')

不幸的是,我得到了以下

{xml_nodeset (0)}
{xml_nodeset (0)}
{xml_nodeset (0)}

我已经看到了很多关于这个问题的问题和答案,但是似乎都没有帮助。

2 个答案:

答案 0 :(得分:0)

不幸的是,rvest在动态创建的JavaScript页面中不能很好地发挥作用。它最适合静态HTML网页。

我建议您看看RSelenium。最后,我使用rsDriver

从页面中获取了一些东西

代码示例:

library(RSelenium)
rD <- rsDriver() # runs a chrome browser, wait for necessary files to download
remDr <- rD$client
#no need for remDr$open() browser should already be open
remDr$navigate("http://stats.nba.com/game/0041700404/playbyplay/")

teams <- remDr$findElement(using = "xpath", "//span[@class='team-full']")
teams$getElementText()[[1]]
# and so on...

remDr$close()
# stop the selenium server
rD[["server"]]$stop() 
# if user forgets to stop server it will be garbage collected.
rD <- rsDriver()
rm(rD)
gc(rD)

以此类推...

PS:在使用当前R的Windows上安装时遇到了一些麻烦 *此worked * How to set up rselenium for R?

答案 1 :(得分:0)

我在R中的启动程序包中取得了成功。要安装,您需要docker。下面列出的网站中提到了安装说明

https://cran.r-project.org/web/packages/splashr/vignettes/intro_to_splashr.html

https://docs.docker.com/docker-for-mac/install/#install-and-run-docker-for-mac -如何在Mac上安装和运行Docker

https://splash.readthedocs.io/en/stable/install.html -在使用启动程序之前,在终端窗口中输入这些代码