Question

它确实有效，但我想抓一下Google返回的第一个链接的描述。对于CRAN关键字，它是：

<span class="st"><em>CRAN</em> is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the <em>CRAN</em>&nbsp;...</span>

但我不知道这里的span部分是什么，请在不使用RSelenium的情况下提供解决方案

Answer 1

使用rvest：

library(rvest)

baseUrl <- 'https://www.google.it/search?q='

query = 'cran'
url <- paste0(baseUrl, query)


read_html(url) %>% 
    html_nodes('.st') %>% 
    # This select only the first result, change number to select another reusult
    # or comment it to get all first page results
    '['(2) %>% 
    html_text()

Answer 2

您可以从Google知识图（位于Google搜索结果页面右侧的摘要框）中进行抓取。

您可以为此使用Google Knowledge Graph API：

在Google Developers Console中创建应用程序

创建身份验证凭据

knowlegdegraph<-function(query)
{
   API_Key<-"Your_API_KEY"
   url<-paste("https://kgsearch.googleapis.com/v1/entities:search?query=",query, 
     "&key=", API_Key,
     "&limit=1&indent=True")
  jdata <- fromJSON(URLencode(url))

}

jdata是一个列表。您可以使用以下方法提取用于说明的JSON元素：

简短说明：

jdata[["itemListElement"]][["result"]][["description"]]

有关详细说明：

jdata[["itemListElement"]][["result"]][["detailedDescription"]][["articleBody"]]

如何从Google获取第一个链接的描述？

2 个答案: