网页抓取时 seleniumR 和 Purr 错误

时间:2021-04-26 09:45:31

标签: r selenium purrr rvest

我正在使用 rSelenium 在 carsales.com.au 处抓取汽车数据。代码的 map_df 部分在过去非常有效,可以轻松处理空字段。但是这个网站抛出了以下错误:

Error: Internal error in `vec_assign()`: `value` should have been recycled to fit `x`.

我对这个错误做了一些研究,但它超出了我的范围。

代码如下:

library(tidyverse)
library(rvest)
library(RSelenium)


#navigate to homepage and get html
rD <- RSelenium::rsDriver(browser="firefox", port= 4845L)
remDr <- rD[["client"]]

# - Manually Calculate pages (website uses an offset of 12 per page)
pages <- seq(from = 0, to = 770, by = 12)
cars <- tibble()

for (i in pages) {
#create URLs to loop over
  url <- 'https://www.carsales.com.au/cars/ford/territory/'
  url <- print(paste0(url,'?offset=', i))

  #Navigate to URLs
  remDr$navigate(url)
  soup <- remDr$getPageSource()
  soup <- xml2::read_html(soup[[1]])
  
  data <- soup %>% 
    html_nodes('div.listing-wrapper') %>% 
    map_df(~list(Model = html_nodes(.x, 'div.col > h3') %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .},
                 Price = html_nodes(.x, '.price > a') %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .},
                 Deets = html_nodes(.x, '.key-details') %>%
                   html_text() %>% 
                   {if(length(.) == 0) NA  else .},
                 sellerType = html_nodes(.x, '.seller-type' ) %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .},
                 sellerLoc = html_nodes(.x, '.seller-location') %>% 
                   html_text() %>% 
                   {if(length(.) == 0) NA else .}
                )
          )
  cars <- as_tibble(rbind(cars, data))
  }

感谢您对此项目的任何帮助。

干杯

0 个答案:

没有答案