如何构建通过谷歌搜索获取数据的搜索查询?

时间:2018-02-04 05:00:34

标签: r google-chrome search google-search

我有一个项目正在处理我需要提取有关佛罗里达州特定公园的数据的地方。例如,我对这篇文章的问题是关于我如何编程R通过谷歌搜索查询来获取区域。当我在公顷#34;中输入wekiva泉州立公园的区域时。进入谷歌搜索我从页面顶部获得实际值" 2,833公顷)。现在我列出了52个公园:

structure(list(`unique(df$ParkName)` = structure(c(14L, 47L, 
39L, 12L, 9L, 20L, 5L, 10L, 25L, 28L, 36L, 30L, 31L, 43L, 4L, 
35L, 44L, 48L, 51L, 6L, 21L, 32L, 38L, 42L, 1L, 41L, 27L, 45L, 
46L, 50L, 18L, 37L, 24L, 26L, 13L, 52L, 15L, 2L, 17L, 11L, 22L, 
34L, 49L, 16L, 40L, 7L, 8L, 29L, 33L, 3L, 23L, 19L), .Label = c("Alafia River State Park", 
"Amelia Island State Park", "Big Cypress National Park", "Big Talbot Island State Park", 
"Bill Baggs Cape Florida State Park", "Blue Spring State Park", 
"Caladesi Island State Park", "Cayo Costa State Park", "Collier-Seminole State Park", 
"Curry Hammock State Park", "Dade Battlefield Historic State Park", 
"De Leon Springs State Park", "Delanor-Wiggins Pass State Park", 
"Fakahatchee Strand Preserve State Park", "Faver-Dykes State Park", 
"Fort Cooper State Park", "Fort George Island Cultural State Park", 
"Fort Pierce Inlet State Park/Avalon State Park", "Fort Zachary Taylor Historic State Park", 
"Highlands Hammock State Park", "Hillsborough River State Park", 
"Honeymoon Island State Park", "Hugh Taylor Birch State Park", 
"John D. MacArthur Beach State Park", "John Pennekamp Coral Reef State Park/Key Largo Hammocks", 
"John U. Lloyd Beach State Park", "Jonathan Dickinson State Park", 
"Key Largo Hammocks", "Koreshan State Historic Site", "Lake Griffin State Park", 
"Lake Kissimmee State Park", "Lake Manatee State Park", "Lake Wales Ridge Geopark", 
"Little Manatee River State Park", "Little Talbot Island State Park", 
"Long Key State Park", "Lovers Key State Park", "Myakka River State Park", 
"Ocala National Forest", "Oleta River State Park", "Oscar Scherer State Park", 
"Paynes Creek Historic State Park", "Paynes Prairie Preserve State Park", 
"Pumpkin Hill Creek Preserve State Park", "Savannas Preserve State Park", 
"Seabranch Preserve State Park", "Sebastian Inlet State Park", 
"Talbot Islands State Parks", "Terra Ceia Preserve State Park", 
"Tosohatchee Wildlife Management Area", "Washington Oaks Gardens State Park", 
"Werner-Boyce Salt Springs State Park"), class = "factor")), .Names = "unique(df$ParkName)", row.names = c(NA, 
-52L), class = "data.frame")

我可以在谷歌搜索栏中手动输入并键入每个公园名称,但我真的想弄清楚如何为此构建搜索查询,以便我可以将其应用于未来的项目。问题是,当涉及到构建任何复杂的东西时,我有点不知所措。我最近才开始开始学习像什么这样的东西" API"等等。

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:2)

使网页抓取使用rvest包,结果在很大程度上取决于每个查询,因为并非所有查询都能返回页面顶部的值。

library(rvest)


 parks <- data.frame(name = c("wekiva springs state park", "cayo costa 
                 state park"))

  url  <- "http://www.google.com"

  s <- html_session(url)
  search <- html_form(s)[[1]]
  for(i in 1:dim(parks)[1]){
    query <- paste("area of",parks[i,1], "in hectares")
    a <- set_values(search, q = query)

    session <- submit_form(s, a) 
    s1 <- html_nodes(session, "#res")
    result <- html_text(s1)

    parks$area[i] <- gsub("([A-Za-z]+).*", "\\1", result)
  }

  parks

                    name     area
1 wekiva springs state park 2.833 ha
2     cayo costa state park 1.014 ha 

要了解一下rvest,here's一个好的起点