使用rvest刮取数据

时间:2015-06-12 20:50:17

标签: r web-scraping rvest

我正在尝试使用以下代码从此页面中搜索每个搜索结果的名称:

url2 <- "http://www.truckandtrailer.ca/search.cfm?intIndustryID=2&searchtype=advanced&pageaction=showresults&bitNew=0&intCategoryID=30&intMakeID=0&intSelectProvinceID=&x=26&y=6"

results <- url2 %>%
  html() %>%
  html_nodes(".desc_title") %>%
  html_text()
results

但它只会返回:

character(0)

有关如何解决此问题的任何想法?感谢帮助!

1 个答案:

答案 0 :(得分:5)

以下是使用RSelenium和rvest的解决方案。

注意:有关使用RSelenium和rvest的信息,请参阅我的回答here

library(RSelenium)
library(rvest)
startServer() 
remDr <- remoteDriver(browserName = 'firefox')
remDr$open()

url2 <- "http://www.truckandtrailer.ca/search.cfm?intIndustryID=2&searchtype=advanced&pageaction=showresults&bitNew=0&intCategoryID=30&intMakeID=0&intSelectProvinceID=&x=26&y=6"
remDr$navigate(url2)
test.html <- html(remDr$getPageSource()[[1]])
  results<-test.html %>%
  html_nodes(".desc_title") %>%
  html_text(trim=TRUE)
  results

[1] "2009 FREIGHTLINER FLD 132 CLASSIC XL HIGHWAY TR..." "2014 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"        
 [3] "2014 KENWORTH W900-L HIGHWAY TRACTOR"               "2014 KENWORTH T660 HIGHWAY TRACTOR"                
 [5] "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"          "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"         
 [7] "(5) 2013 FREIGHTLINER CASCADIA - 113 HIGHWAY TR..." "(2) 2013 INTERNATIONAL PROSTAR HIGHWAY TRACTOR"    
 [9] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900B HIGHWAY TRACTOR"               
[11] "(2) 2013 KENWORTH T700 HIGHWAY TRACTOR"             "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[13] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900L HIGHWAY TRACTOR"               
[15] "2013 KENWORTH W900L HIGHWAY TRACTOR"                "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[17] "2013 PETERBILT 388 HIGHWAY TRACTOR"                 "(5) 2013 PETERBILT 388 HIGHWAY TRACTOR"            
[19] "2013 PETERBILT 389 HIGHWAY TRACTOR"                 "2013 PETERBILT 388 HIGHWAY TRACTOR"                
[21] "2013 VOLVO VNL670 HIGHWAY TRACTOR"                  "2013 VOLVO VNL630 HIGHWAY TRACTOR"                 
[23] "(5) 2012 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"     "2012 FREIGHTLINER CA125 HIGHWAY TRACTOR"           
[25] "(2) 2012 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"   
remDr$close()

另一种方法是使用Phantomjs(不需要使用cmd,也不需要额外的浏览器)。这里唯一需要的是从here下载exe文件并将其放在R工作目录中(如果不想将其放在工作目录中,也可以指定路径)。

library(RSelenium)
library(rvest)
pJS <- phantom(extras = c('--ssl-protocol=tlsv1'))
remDr <- remoteDriver(browserName = "phantom")
remDr$open()
remDr$navigate("http://www.truckandtrailer.ca/search.cfm?intIndustryID=2&searchtype=advanced&pageaction=showresults&bitNew=0&intCategoryID=30&intMakeID=0&intSelectProvinceID=&x=26&y=6")
test.html <- html(remDr$getPageSource()[[1]])
results<-test.html %>%
       html_nodes(".desc_title") %>%
      html_text(trim=TRUE)
> results
[1] "2009 FREIGHTLINER FLD 132 CLASSIC XL HIGHWAY TR..." "2014 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"        
[3] "2014 KENWORTH W900-L HIGHWAY TRACTOR"               "2014 KENWORTH T660 HIGHWAY TRACTOR"                
[5] "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"          "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"         
[7] "(5) 2013 FREIGHTLINER CASCADIA - 113 HIGHWAY TR..." "(2) 2013 INTERNATIONAL PROSTAR HIGHWAY TRACTOR"    
[9] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900B HIGHWAY TRACTOR"               
[11] "(2) 2013 KENWORTH T700 HIGHWAY TRACTOR"             "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[13] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900L HIGHWAY TRACTOR"               
[15] "2013 KENWORTH W900L HIGHWAY TRACTOR"                "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[17] "2013 PETERBILT 388 HIGHWAY TRACTOR"                 "(5) 2013 PETERBILT 388 HIGHWAY TRACTOR"            
[19] "2013 PETERBILT 389 HIGHWAY TRACTOR"                 "2013 PETERBILT 388 HIGHWAY TRACTOR"                
[21] "2013 VOLVO VNL670 HIGHWAY TRACTOR"                  "2013 VOLVO VNL630 HIGHWAY TRACTOR"    
remDr$close
pJS$stop()

P.S。有关详细信息,请参阅help文件。