我无法从网站提取图像链接。
我不熟悉数据抓取。我已经使用Selectorgadget以及inspect元素方法来获取图像的类,但无济于事。
main.page <- read_html(x= "https://www.espncricinfo.com/series/17213/scorecard/64951/england-vs-india-1st-odi-india-tour-of-england-1974")
urls <- main.page %>%
html_nodes(".match-detail--item:nth-child(9) .lazyloaded") %>%
html_attr("src")
sotu <- data.frame(urls = urls)
我得到以下输出:
<0 rows> (or 0-length row.names)
答案 0 :(得分:2)
由于某些原因,某些类和参数未显示在抓取的数据中。只需定位img
而不是.lazyloaded
和data-src
而不是src
:
library(rvest)
main.page <- read_html("https://www.espncricinfo.com/series/17213/scorecard/64951/england-vs-india-1st-odi-india-tour-of-england-1974")
main.page %>%
html_nodes(".match-detail--item:nth-child(9) img") %>%
html_attr("data-src")
#### OUTPUT ####
[1] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/1.png&h=25&w=25"
[2] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[3] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[4] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[5] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[6] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[7] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[8] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[9] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[10] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[11] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
[12] "https://a1.espncdn.com/combiner/i?img=/i/teamlogos/cricket/500/6.png&h=25&w=25"
答案 1 :(得分:0)
由于使用浏览器时通过javascript(使用React)通过DOM修改了DOM,因此您无法获得rvest的相同布局。不太理想的是,您可以将链接所在的javascript对象中的信息进行正则表达式。然后使用json解析器提取链接
library(rvest)
library(jsonlite)
library(stringr)
library(magrittr)
url <- "https://www.espncricinfo.com/series/17213/scorecard/64951/england-vs-india-1st-odi-india-tour-of-england-1974"
r <- read_html(url) %>%
html_nodes('body') %>%
html_text() %>%
toString()
x <- str_match_all(r,'debuts":(.*?\\])')
json <- jsonlite::fromJSON(x[[1]][,2])
print(json$imgicon)