我正在使用 rSelenium 在 carsales.com.au
处抓取汽车数据。代码的 map_df
部分在过去非常有效,可以轻松处理空字段。但是这个网站抛出了以下错误:
Error: Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.
我对这个错误做了一些研究,但它超出了我的范围。
代码如下:
library(tidyverse)
library(rvest)
library(RSelenium)
#navigate to homepage and get html
rD <- RSelenium::rsDriver(browser="firefox", port= 4845L)
remDr <- rD[["client"]]
# - Manually Calculate pages (website uses an offset of 12 per page)
pages <- seq(from = 0, to = 770, by = 12)
cars <- tibble()
for (i in pages) {
#create URLs to loop over
url <- 'https://www.carsales.com.au/cars/ford/territory/'
url <- print(paste0(url,'?offset=', i))
#Navigate to URLs
remDr$navigate(url)
soup <- remDr$getPageSource()
soup <- xml2::read_html(soup[[1]])
data <- soup %>%
html_nodes('div.listing-wrapper') %>%
map_df(~list(Model = html_nodes(.x, 'div.col > h3') %>%
html_text() %>%
{if(length(.) == 0) NA else .},
Price = html_nodes(.x, '.price > a') %>%
html_text() %>%
{if(length(.) == 0) NA else .},
Deets = html_nodes(.x, '.key-details') %>%
html_text() %>%
{if(length(.) == 0) NA else .},
sellerType = html_nodes(.x, '.seller-type' ) %>%
html_text() %>%
{if(length(.) == 0) NA else .},
sellerLoc = html_nodes(.x, '.seller-location') %>%
html_text() %>%
{if(length(.) == 0) NA else .}
)
)
cars <- as_tibble(rbind(cars, data))
}
感谢您对此项目的任何帮助。
干杯