在R中网络抓取搜索结果

时间:2020-08-19 20:14:33

标签: javascript r web-scraping phantomjs rvest

我是网络爬虫的新手,我正在尝试抓取网站中搜索功能产生的一些数据。我正在使用rvest获取信息,但没有得到结果。这是网站:

https://www.encompassinsurance.com/agency-locator.aspx#PostalCode=30350&City=&StateProvCd=&Latitude=&Longitude=

这是我正在运行的:

URL <- 'https://www.encompassinsurance.com/agency-locator.aspx#PostalCode=21403&City=&StateProvCd=&Latitude=&Longitude='

webpage <- read_html(URL)

name_html <- html_nodes(webpage,'.locator_result_name')

name_data <- html_text(name_html)

运行此代码时,我得到以下响应: 字符(0)

我希望作为邮政编码搜索结果的每个公司的名称(例如“ Townley-Kenton Insurance Agency”,“ Bradford Turner Insurance Group LLC”)。

我知道此页面上有一些Javascript,并且我可能会遗漏重要的文章,但是鉴于我对html,CSS,javascript的了解有限,所以我不确定如何应用V8或PhantomJS来完成这项工作。

感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

确实确实是使用javascript(通过XHR GET请求)动态获取数据的。但是,可以使用httr包直接从R发送此请求。它返回一个JSON字符串,该字符串易于使用jsonlite进行解析。

您要抓取的几乎所有信息都将在数据框Info$OfficeInfo中:

library(httr)
library(jsonlite)

res <- content(GET(paste0("https://alr.encompassinsurance.com/",
                          "?PostalCode=30350&City=&StateProvCd=",
                          "&Latitude=&Longitude=")), "text")
info <- fromJSON(res)

info$OfficeInfo$Name
#>  [1] "Townley-Kenton Insurance Agency"                          
#>  [2] "Bradford Turner Insurance Group LLC"                      
#>  [3] "Arthur J Gallagher Risk Management Services, Inc."        
#>  [4] "Lanigan Insurance Group Inc"                              
#>  [5] "Haven Insurance Group"                                    
#>  [6] "The Leavitt Insurance Group of Atlanta, Incorporated"     
#>  [7] "Findley Insurance Agency Inc"                             
#>  [8] "Grimes Insurance Agency Inc"                              
#>  [9] "Larry L Talbert Ins Agency DBA Talbert Insurance Services"
#> [10] "The Alliance Group, Inc."                                 
#> [11] "Concierge Insurance Group LLC"                            
#> [12] "Sutter McLellan & Gilbreath Inc"                          
#> [13] "The Wichalonis Insurance Agency"                          
#> [14] "The Beck Agency"                                          
#> [15] "USI Insurance Services LLC"                               
#> [16] "The Insurance Store"                                      
#> [17] "Southern Insurance Associates of Dunwoody"                
#> [18] "D.C.J.D. Corporation DBA The Markey Insurance Group"      
#> [19] "DM Services, Incorporated"                                
#> [20] "Southern Insurance Advisors"                              
#> [21] "Metro Brokers Insurance Services"                         
#> [22] "1 Source Insurance, LLC"                                  
#> [23] "The Bates Agency II, LLC"                                 
#> [24] "Risk & Insurance Consultants Inc"                         
#> [25] "Integrity Insurance & Financial Services Inc"             
#> [26] "HN Insurance Services Inc"                                
#> [27] "Norton Metro LLC"                                         
#> [28] "The Nsure Network LLC"                                    
#> [29] "Henssler Norton Insurance LLC"                            
#> [30] "Brown & Brown Insurance of Georgia"                       
#> [31] "America Insurance Brokers, Inc. DBA AIB"                  
#> [32] "Clear View Insurance Agency"                              
#> [33] "Relation Insurance Services"                              
#> [34] "Partners Risk Services LLC"                               
#> [35] "PointeNorth Insurance Group LLC"                          
#> [36] "Advanced Insurors Inc"                                    
#> [37] "Mcever & Tribble, Inc."                                   
#> [38] "The Bethea Insurance Group, LLC"                          
#> [39] "Watchko - Young Ins Agcy Inc"                             
#> [40] "Sterling Seacrest Partners Inc"                           
#> [41] "Little & Smith, Incorporated"                             
#> [42] "LMG Insurance Services Inc"                               
#> [43] "Granite Risk Advisors LLC"                                
#> [44] "Mountain Lakes Insurance, LLC"                            
#> [45] "Hutchinson Traylor Insurance"                             
#> [46] "Edgewood Partners Insurance Center"                       
#> [47] "ADC Agency"                                               
#> [48] "MLG Insurance & Financial Services"                       
#> [49] "Burnette Insurance Agency"                                
#> [50] "Campbell and Company Enterprise, Incorporated"

reprex package(v0.3.0)于2020-08-19创建