无法尝试从NYT Community API获取数据

时间:2016-09-04 15:16:02

标签: r getjson

我试图在评论中对纽约时报的评论做一些NLP。我已经获得了我的社区API密钥,并且正在关注“时代周刊”的例子。网站和rtimes包(它没有实际的社区API功能),但是虽然我的脚本不会抛出错误,但它也不会返回任何数据。

这是“泰晤士报”基于文章网址建议评论的GET脚本:

http://api.nytimes.com/svc/community/{version}/user-content/url.json?api-key={your-API-key}&url={url}[&offset=int]  

所以这就是我尝试过的:

library(httr)
library(RJSONIO)
library (RCurl)

jesusComments <- GET(paste0('http://api.nytimes.com/svc/community/v3/user-content/url.json?', 'api-key=', communityAPI, '&url=', q, '&offset=int'))

communityAPI是我的密钥,而q是我试图获取评论的文章的网址。这就是它的回报:

> str(jesusComments)
List of 10
 $ url        : chr "http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=communityAPI&url=http://www.nytimes.c"| __truncated__
 $ status_code: int 400
 $ headers    :List of 17
  ..$ cache-control               : chr "max-age=10"
  ..$ content-type                : chr "application/json; charset=UTF-8"
  ..$ date                        : chr "Sun, 04 Sep 2016 13:33:20 GMT"
  ..$ expires                     : chr "Sun, 04 Sep 2016 13:33:30 GMT"
  ..$ last-modified               : chr "Sun, 04 Sep 2016 13:33:20"
  ..$ pragma                      : chr "cache"
  ..$ server                      : chr "nginx/1.10.1"
  ..$ via                         : chr "kong/0.8.3"
  ..$ x-kong-proxy-latency        : chr "2"
  ..$ x-kong-upstream-latency     : chr "31"
  ..$ x-powered-by                : chr "PHP/5.5.30"
  ..$ x-ratelimit-limit-day       : chr "1000"
  ..$ x-ratelimit-limit-second    : chr "5"
  ..$ x-ratelimit-remaining-day   : chr "939"
  ..$ x-ratelimit-remaining-second: chr "4"
  ..$ content-length              : chr "237"
  ..$ connection                  : chr "keep-alive"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ all_headers:List of 1
  ..$ :List of 3
  .. ..$ status : int 400
  .. ..$ version: chr "HTTP/1.1"
  .. ..$ headers:List of 17
  .. .. ..$ cache-control               : chr "max-age=10"
  .. .. ..$ content-type                : chr "application/json; charset=UTF-8"
  .. .. ..$ date                        : chr "Sun, 04 Sep 2016 13:33:20 GMT"
  .. .. ..$ expires                     : chr "Sun, 04 Sep 2016 13:33:30 GMT"
  .. .. ..$ last-modified               : chr "Sun, 04 Sep 2016 13:33:20"
  .. .. ..$ pragma                      : chr "cache"
  .. .. ..$ server                      : chr "nginx/1.10.1"
  .. .. ..$ via                         : chr "kong/0.8.3"
  .. .. ..$ x-kong-proxy-latency        : chr "2"
  .. .. ..$ x-kong-upstream-latency     : chr "31"
  .. .. ..$ x-powered-by                : chr "PHP/5.5.30"
  .. .. ..$ x-ratelimit-limit-day       : chr "1000"
  .. .. ..$ x-ratelimit-limit-second    : chr "5"
  .. .. ..$ x-ratelimit-remaining-day   : chr "939"
  .. .. ..$ x-ratelimit-remaining-second: chr "4"
  .. .. ..$ content-length              : chr "237"
  .. .. ..$ connection                  : chr "keep-alive"
  .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ cookies    :'data.frame':    0 obs. of  7 variables:
 ..$ domain    : logi(0) 
  ..$ flag      : logi(0) 
  ..$ path      : logi(0) 
  ..$ secure    : logi(0) 
  ..$ expiration:Classes 'POSIXct', 'POSIXt'  num(0) 
  ..$ name      : logi(0) 
  ..$ value     : logi(0) 
 $ content    : raw [1:237] 7b 22 64 65 ...
 $ date       : POSIXct[1:1], format: "2016-09-04 13:33:20"
 $ times      : Named num [1:6] 0 0.0958 0.1886 0.1887 0.3195 ...
  ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect"     "pretransfer" ...
 $ request    :List of 7
  ..$ method    : chr "GET"
  ..$ url       : chr "http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=communityAPI&url=http://www.nytimes.c"| __truncated__
  ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"
  .. ..- attr(*, "names")= chr "Accept"
  ..$ fields    : NULL
  ..$ options   :List of 2
  .. ..$ useragent    : chr "libcurl/7.43.0 r-curl/0.9.7 httr/1.1.0"
  .. ..$ customrequest: chr "GET"
  ..$ auth_token: NULL
  ..$ output    : list()
  .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
  ..- attr(*, "class")= chr "request"
 $ handle     :Class 'curl_handle' <externalptr> 
 - attr(*, "class")= chr "response"

这是我本来期待的截断版本:

{
  "debug": {
    "version": 3.1
  },
  "status": "OK",
  "copyright": "Copyright (c) 2016 The New York Times Company.  All Rights Reserved.",
  "results": {
    "comments": [
      {
        "commentID": 19695448,
        "status": "approved",
        "commentSequence": 19695448,
        "userID": 17571649,
        "userDisplayName": "Aunty W Bush",
        "userLocation": "Ohio",
        "userTitle": "NULL",
        "userURL": "NULL",
        "commentTitle": "<br/>",
        "commentBody": "Yeah, the New Pope is fresh air for this un-churchy guy.we need more examples like him.<br/>one encouraging note for me. In driving through the country 2 decades ago, the Christian radio seem filled with hate and war.<br/>the new generation of talk shows seems more into Christ's message of love and redemption.",
        "createDate": "1472992616",
        "updateDate": "1473001720",
        "approveDate": "1473001720",
        "recommendations": 0,
        "replyCount": 0,
        "replies": [],
        "editorsSelection": false,
        "parentID": null,
        "parentUserDisplayName": null,
        "depth": 1,
        "commentType": "comment",
        "trusted": 0,
        "recommendedFlag": 0,
        "reportAbuseFlag": 0,
        "permID": "19695448",
        "picURL": "https://graphics8.nytimes.com/images/apps/timespeople/none.png"
  },
 ....
  "page": 1,
    "totalCommentsReturned": 25,
    "totalCommentsFound": 639,
    "totalParentCommentsFound": 476,
    "totalParentCommentsReturned": 25,
    "totalReplyCommentsFound": 163,
    "totalReplyCommentsReturned": 0,
    "totalReporterReplyCommentsFound": 0,
    "totalReporterReplyCommentsReturned": 0,
    "totalEditorsSelectionFound": 18,
    "totalEditorsSelectionReturned": 1,
    "totalRecommendationsFound": 402,
    "totalRecommendationsReturned": 16,
    "replyLimit": 3,
    "depthLimit": 0,
    "sort": "oldest",
    "filter": "",
    "callerID": 4682550,
    "api_timestamp": "1473001942"

问题在于它没有丢失任何错误,但它也没有返回任何数据。它只返回一堆元数据,但没有评论。我在处理API方面不是很熟练,所以非常感谢任何帮助。

更新:我已经发现了问题,就是我的原始代码中有一组方括号,因此,它没有连接。现在的问题是我需要能够设置偏移量,以便一次返回一批评论,然后是下一批评论,等等,因为“时代”和“时代”是怎样的。 API有效。根据我通过谷歌搜索找到的内容,我应该可以将其设置为100,然后是200,然后是300,依此类推。每当我设置一个偏移量时,它会给出status_code: 200,这意味着一切正常,但没有返回任何注释。

json_file <- GET(url = 'http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=XXXXXXXXXXXXXXXX&url=http://www.nytimes.com/2016/09/04/opinion/sunday/what-religion-would-jesus-belong-to.html?ref=opinion&_r=0')

上面的代码只返回了一些注释,下面的代码没有返回。我做错了什么?

json_file <- GET(url = 'http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=00a5978d97934d4fb21e0265c82d844f&url=http://www.nytimes.com/2016/09/04/opinion/sunday/what-religion-would-jesus-belong-to.html?ref=opinion&_r=0&offset=100')

0 个答案:

没有答案