将日期参数传递给REST API调用 - 使用R-

时间:2017-10-03 11:03:51

标签: r url-encoding jsonlite

尝试从REST API中提取一些数据但是无法正确地将日期参数传递到字符串中。使用sprintf我成功地传递了搜索词和网站,但是对于discoverDate没有运气。

https://newsriver.io是有问题的API

Function to grab data by one search term and one website

get_newsriver_content <- function(searcht,website,api_key){
url <- sprintf('https://api.newsriver.io/v2/search?query=text%%3A%s%%20OR%%20website.domainName%%3A%s%%20OR%%20language%%3AEN&sortBy=_score&sortOrder=DESC&limit=100',searcht, website)
news_get<- GET(url, add_headers(Authorization = paste(api_key, sep = "")))
news_txt <- content(news_get, as = "text", encoding = "UTF-8") 
news_df <- fromJSON(news_txt)
news_df$discoverDate <- as.Date(news_df$discoverDate)
news_df
}

问题已更新 - 我还想基于日期向量进行多个API调用。

1 个答案:

答案 0 :(得分:1)

以下是我如何解决我的问题

这真的是一个两步问题

  1. 弄清楚如何正确编码我要在Curl Call中插入的查询
  2. 创建一个基于日期向量进行API调用的函数,并将其附加到数据框。
  3. 我是这样做的。

    library(tidyverse)
    library(jsonlite)
    library(urltools)
    library(httr)
    
    # Function For Pulling by Date  
    get_newsriver_bydate <- function(query, date_v){
    
    #Being Kind to the free API - Shout out to Elia at Newsriver who has been ever patient
    pb$tick()$print()
    Sys.sleep(sample(seq(0.5, 2.5, 0.5), 1))
    
    #This is where is used the URL encode package as suggested by quartin
    url_base <- "https://api.newsriver.io/v2/search"
    create_curl_call <- url_base %>% 
    param_set("query",url_encode(query)) %>% 
    param_set("sortBy", "_score") %>% 
    param_set("sortOrder", "DESC") %>% 
    param_set("limit", "100") 
    
    #I had most of this before however I changed my output to a tibble
    #more versatile to work with 
    
    get_curl <- GET(create_curl_call, add_headers(Authorization = paste(api_key, sep = "")))
    curl_to_json <- content(get_curl, as = "text", encoding = "UTF-8")
    news_df <- fromJSON(curl_to_json, flatten = TRUE)
    news_df$discoverDate <- as.Date(news_df$discoverDate)
    as.tibble(news_df)
    }
    
    # Set Configration and Set API key
    set_config(config(ssl_verifypeer = 0L))
    api_key <- "mykey"
    
    #Set my vector of Dates
    dates1 <- seq(as.Date("2017-09-01"), as.Date("2017-10-01"), by = "days")
    
    #Set up my progress bar
    pb <- progress_estimated(length(dates1))
    
    #Sprintf my query into a vector of queries based on date
    query <- sprintf('text:"Canada" AND text:"Rocks" AND language:EN AND discoverDate:[%s TO %s]',dates1, dates1)
    
     #Run the query and be patient
    news_df <- map_df(query, get_newsriver_bydate, .id = "query")
    

    因此,对于我的研究方法以及我如何解决这两个问题

    1. Quartin给了我一个建议来查找urltools包https://cran.rstudio.com/web/packages/urltools/index.html - 这个包可以帮助你编码和解码你的URL以及其他各种快速和矢量化的函数。接下来我的问题是我的查询正确在这里我只是查找了API文档,我建议任何人试图从API中提取。可能听起来像是没脑子但是在发布我的问题之前我还没有完整阅读

    2. 创建函数我使用了许多先前的答案来帮助构建它,但是下面的帖子帮助最多

    3. API Query for loop 这篇文章帮助我完成了进度条和地图功能,将所有内容整合到一个数据框中。

      可能有一个更好的答案,但到目前为止这对我有用。