我试图在评论中对纽约时报的评论做一些NLP。我已经获得了我的社区API密钥,并且正在关注“时代周刊”的例子。网站和rtimes
包(它没有实际的社区API功能),但是虽然我的脚本不会抛出错误,但它也不会返回任何数据。
这是“泰晤士报”基于文章网址建议评论的GET
脚本:
http://api.nytimes.com/svc/community/{version}/user-content/url.json?api-key={your-API-key}&url={url}[&offset=int]
所以这就是我尝试过的:
library(httr)
library(RJSONIO)
library (RCurl)
jesusComments <- GET(paste0('http://api.nytimes.com/svc/community/v3/user-content/url.json?', 'api-key=', communityAPI, '&url=', q, '&offset=int'))
communityAPI
是我的密钥,而q
是我试图获取评论的文章的网址。这就是它的回报:
> str(jesusComments)
List of 10
$ url : chr "http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=communityAPI&url=http://www.nytimes.c"| __truncated__
$ status_code: int 400
$ headers :List of 17
..$ cache-control : chr "max-age=10"
..$ content-type : chr "application/json; charset=UTF-8"
..$ date : chr "Sun, 04 Sep 2016 13:33:20 GMT"
..$ expires : chr "Sun, 04 Sep 2016 13:33:30 GMT"
..$ last-modified : chr "Sun, 04 Sep 2016 13:33:20"
..$ pragma : chr "cache"
..$ server : chr "nginx/1.10.1"
..$ via : chr "kong/0.8.3"
..$ x-kong-proxy-latency : chr "2"
..$ x-kong-upstream-latency : chr "31"
..$ x-powered-by : chr "PHP/5.5.30"
..$ x-ratelimit-limit-day : chr "1000"
..$ x-ratelimit-limit-second : chr "5"
..$ x-ratelimit-remaining-day : chr "939"
..$ x-ratelimit-remaining-second: chr "4"
..$ content-length : chr "237"
..$ connection : chr "keep-alive"
..- attr(*, "class")= chr [1:2] "insensitive" "list"
$ all_headers:List of 1
..$ :List of 3
.. ..$ status : int 400
.. ..$ version: chr "HTTP/1.1"
.. ..$ headers:List of 17
.. .. ..$ cache-control : chr "max-age=10"
.. .. ..$ content-type : chr "application/json; charset=UTF-8"
.. .. ..$ date : chr "Sun, 04 Sep 2016 13:33:20 GMT"
.. .. ..$ expires : chr "Sun, 04 Sep 2016 13:33:30 GMT"
.. .. ..$ last-modified : chr "Sun, 04 Sep 2016 13:33:20"
.. .. ..$ pragma : chr "cache"
.. .. ..$ server : chr "nginx/1.10.1"
.. .. ..$ via : chr "kong/0.8.3"
.. .. ..$ x-kong-proxy-latency : chr "2"
.. .. ..$ x-kong-upstream-latency : chr "31"
.. .. ..$ x-powered-by : chr "PHP/5.5.30"
.. .. ..$ x-ratelimit-limit-day : chr "1000"
.. .. ..$ x-ratelimit-limit-second : chr "5"
.. .. ..$ x-ratelimit-remaining-day : chr "939"
.. .. ..$ x-ratelimit-remaining-second: chr "4"
.. .. ..$ content-length : chr "237"
.. .. ..$ connection : chr "keep-alive"
.. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
$ cookies :'data.frame': 0 obs. of 7 variables:
..$ domain : logi(0)
..$ flag : logi(0)
..$ path : logi(0)
..$ secure : logi(0)
..$ expiration:Classes 'POSIXct', 'POSIXt' num(0)
..$ name : logi(0)
..$ value : logi(0)
$ content : raw [1:237] 7b 22 64 65 ...
$ date : POSIXct[1:1], format: "2016-09-04 13:33:20"
$ times : Named num [1:6] 0 0.0958 0.1886 0.1887 0.3195 ...
..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
$ request :List of 7
..$ method : chr "GET"
..$ url : chr "http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=communityAPI&url=http://www.nytimes.c"| __truncated__
..$ headers : Named chr "application/json, text/xml, application/xml, */*"
.. ..- attr(*, "names")= chr "Accept"
..$ fields : NULL
..$ options :List of 2
.. ..$ useragent : chr "libcurl/7.43.0 r-curl/0.9.7 httr/1.1.0"
.. ..$ customrequest: chr "GET"
..$ auth_token: NULL
..$ output : list()
.. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
..- attr(*, "class")= chr "request"
$ handle :Class 'curl_handle' <externalptr>
- attr(*, "class")= chr "response"
这是我本来期待的截断版本:
{
"debug": {
"version": 3.1
},
"status": "OK",
"copyright": "Copyright (c) 2016 The New York Times Company. All Rights Reserved.",
"results": {
"comments": [
{
"commentID": 19695448,
"status": "approved",
"commentSequence": 19695448,
"userID": 17571649,
"userDisplayName": "Aunty W Bush",
"userLocation": "Ohio",
"userTitle": "NULL",
"userURL": "NULL",
"commentTitle": "<br/>",
"commentBody": "Yeah, the New Pope is fresh air for this un-churchy guy.we need more examples like him.<br/>one encouraging note for me. In driving through the country 2 decades ago, the Christian radio seem filled with hate and war.<br/>the new generation of talk shows seems more into Christ's message of love and redemption.",
"createDate": "1472992616",
"updateDate": "1473001720",
"approveDate": "1473001720",
"recommendations": 0,
"replyCount": 0,
"replies": [],
"editorsSelection": false,
"parentID": null,
"parentUserDisplayName": null,
"depth": 1,
"commentType": "comment",
"trusted": 0,
"recommendedFlag": 0,
"reportAbuseFlag": 0,
"permID": "19695448",
"picURL": "https://graphics8.nytimes.com/images/apps/timespeople/none.png"
},
....
"page": 1,
"totalCommentsReturned": 25,
"totalCommentsFound": 639,
"totalParentCommentsFound": 476,
"totalParentCommentsReturned": 25,
"totalReplyCommentsFound": 163,
"totalReplyCommentsReturned": 0,
"totalReporterReplyCommentsFound": 0,
"totalReporterReplyCommentsReturned": 0,
"totalEditorsSelectionFound": 18,
"totalEditorsSelectionReturned": 1,
"totalRecommendationsFound": 402,
"totalRecommendationsReturned": 16,
"replyLimit": 3,
"depthLimit": 0,
"sort": "oldest",
"filter": "",
"callerID": 4682550,
"api_timestamp": "1473001942"
问题在于它没有丢失任何错误,但它也没有返回任何数据。它只返回一堆元数据,但没有评论。我在处理API方面不是很熟练,所以非常感谢任何帮助。
更新:我已经发现了问题,就是我的原始代码中有一组方括号,因此,它没有连接。现在的问题是我需要能够设置偏移量,以便一次返回一批评论,然后是下一批评论,等等,因为“时代”和“时代”是怎样的。 API有效。根据我通过谷歌搜索找到的内容,我应该可以将其设置为100,然后是200,然后是300,依此类推。每当我设置一个偏移量时,它会给出status_code: 200
,这意味着一切正常,但没有返回任何注释。
json_file <- GET(url = 'http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=XXXXXXXXXXXXXXXX&url=http://www.nytimes.com/2016/09/04/opinion/sunday/what-religion-would-jesus-belong-to.html?ref=opinion&_r=0')
上面的代码只返回了一些注释,下面的代码没有返回。我做错了什么?
json_file <- GET(url = 'http://api.nytimes.com/svc/community/v3/user-content/url.json?api-key=00a5978d97934d4fb21e0265c82d844f&url=http://www.nytimes.com/2016/09/04/opinion/sunday/what-religion-would-jesus-belong-to.html?ref=opinion&_r=0&offset=100')