R:来自RCurl包的postForm并发出API调用

时间:2013-08-24 01:40:11

标签: r api http curl rcurl

是否有人对postForm包中RCurl限制有经验?

我从服务器上取出数据,几乎无处不在,我收到错误消息* HTTP 1.0, assume close after body,然后是500 Internal Server Error。我测试了配置,一切似乎都很好。我创建了一个干净的数据库,并且当时重新上传了我的数据库20/30个案例,同时使用来自postForm的API / R来反复提取数据。它一切正常,直到我遇到大约150个案例然后出现错误消息。无论我上传的顺序是什么,错误出现在150/160左右,总文件大小在11到12 MB左右。换句话说,错误似乎并不依赖于特定情况,因为它不是打破它的相同情况

任何建议都将受到赞赏。

我附上了一个屏幕截图,为这个相当无聊的帖子添加了一点点,并弥补了没有一个有效的例子,

enter image description here

更新2013-08-24 19:33:18Z

这是我的curlVersion()$versionsessionInfo()信息,

> curlVersion()$version
[1] "7.22.0"
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RCurl_1.95-4.1 bitops_1.0-6

更新2013-08-26 05:39:26Z

根据hadley's comment的建议,我添加了有效呼叫的详细RCurl输出以及失败的呼叫,请参阅下面的

在数据库中使用少于150个案例的调用

> R.object.API <- postForm(R.object.URL, token=R.object.token, content="record", type="flat", format="csv", rawOrLabel="Label", .opts=curlOptions(ssl.verifypeer=TRUE, cainfo=R.object.crt, verbose=TRUE))
* About to connect() to research.org port 443 (#0)
*   Trying xx.xx.xxx.xxx... * connected
* successfully set certificate verify locations:
*   CAfile: /home/dir/research.cert
  CApath: /etc/ssl/certs
* SSL connection using DHE-RSA-AES256-SHA
* Server certificate:
*      subject: C=XX; postalCode=XXXXX-XXXX; ST=XX; L=XXXXXX; street=XXX; street=XX XXXXXX XX; O=XXXX, XXX; OU=XXX; CN=research.org
*      start date: 2013-02-04 00:00:00 GMT
*      expire date: 2016-02-04 23:59:59 GMT
*      subjectAltName: research.org matched
*      issuer: C=US; O=XXXXXX; OU=XXXXXX; CN=XXXXXX Server XX
*      SSL certificate verify ok.
> POST /api/ HTTP/1.1
Host: research.org
Accept: */*
Content-Length: 573
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------XXXXXXXXXXXX

< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Date: Mon, 26 Aug 2013 05:16:44 GMT
< Server: Apache/2.2.15 (Red Hat)
< X-Powered-By: PHP/5.3.3
< Expires: 0
< cache-control: no-store, no-cache, must-revalidate
< Pragma: no-cache
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=utf-8
< 
* Closing connection #0
> 

调用失败,数据库中有超过150个案例

> R.object.API <- postForm(R.object.URL, token=R.object.token, content="record", type="flat", format="csv", rawOrLabel="Label", .opts=curlOptions(ssl.verifypeer=TRUE, cainfo=R.object.crt, verbose=TRUE))
* About to connect() to research.org port 443 (#0)
*   Trying xx.xx.xxx.xxx... * connected
* successfully set certificate verify locations:
*   CAfile: /home/dir/research.cert
  CApath: /etc/ssl/certs
* SSL connection using DHE-RSA-AES256-SHA
* Server certificate:
*      subject: C=XX; postalCode=XXXXX-XXXX; ST=XX; L=XXXXXX; street=XXX; street=XX XXXXXX XX; O=XXXX, XXX; OU=XXX; CN=research.org
*      start date: 2013-02-04 00:00:00 GMT
*      expire date: 2016-02-04 23:59:59 GMT
*      subjectAltName: research.org matched
*      issuer: C=US; O=XXXXXX; OU=XXXXXX; CN=XXXXXX Server XX
*      SSL certificate verify ok.
> POST /api/ HTTP/1.1
Host: research.org
Accept: */*
Content-Length: 573
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------XXXXXXXXXXXX

< HTTP/1.1 100 Continue
* HTTP 1.0, assume close after body
< HTTP/1.0 500 Internal Server Error
< Date: Mon, 26 Aug 2013 05:15:05 GMT
< Server: Apache/2.2.15 (Red Hat)
< X-Powered-By: PHP/5.3.3
< Expires: 0
< cache-control: no-store, no-cache, must-revalidate
< Pragma: no-cache
< Content-Length: 276
< Connection: close
< Content-Type: text/html; charset=UTF-8
< 
* Closing connection #0
Error: Internal Server Error

2 个答案:

答案 0 :(得分:1)

不回答你的问题,但与选项和keepalive有关:

RCurl使用libcurl库。这与CURL命令行工具不同。您需要查看libcurl选项hereCURLOPT_TCP_KEEPALIVE也许你想要的。在RCurl中,如果tcp.keepalive中存在listCurlOptions(),则会将其列为libcurl

在easyopt手册页中,这是在7.25.0中添加的。您可以通过运行

来查看RCurl > curlVersion()$version [1] "7.22.0" 正在使用的版本
libcurl

不幸的是,RCurl {{1}}的版本正在使用并不会处理保持活动。

答案 1 :(得分:0)

将其放在答案中,因为它更容易格式化。我的建议是将以下内容保存在文件中,填写表单地址(即将WHEREVER_YOU_ARE_TRYING_TO_POST_DATA更改为相应的地址),将其命名为test.html,然后在浏览器中打开。

我已根据您的上述示例填写了大部分值,但我不知道令牌字段应该包含哪些内容 - 这取决于您的具体问题。

<!DOCTYPE html>
<html>
<head></head>
<body>   
<div class="content">
    <form action="WHEREVER_YOU_ARE_TRYING_TO_POST_DATA" method="post">
        token: <input name="token" type="text" size="100" /><br />
        content: <input name="content" type="text" size="100" value="record" /><br />
        type: <input name="type" type="text" size="100" value="flat"/><br />
        format: <input name="format" type="text" size="100" value="csv"/><br />
        rawOrLabel: <input name="rawOrLabel" type="text" size="100" value="Label"/><br />
        <input name="Submit" type="submit" value="submit" />
    </form>
</div>
</body>
</html>

如果提交此表单有效,但您的postForm代码没有,那么您的R代码会发生一些奇怪的事情。如果它们都失败了那么你就有了一个更深层次的问题,可能与R无关。