我正在尝试使用代理来抓取网页,但有些东西不起作用。 这是设置代理选项的httr尝试,下面我尝试使用RCurl。 我已经阅读了有关该主题的几个答案,但它们似乎没有起作用。 有什么建议吗?
### httr attempt
set_config(
use_proxy(url="proxy.xxx.com.ar", port=8080,
username = "xxxx\\xxxx", password = "xxxxx"),
override = TRUE
)
a <- GET("http://google.com/", verbose())
-> GET http://google.com/ HTTP/1.1
-> Proxy-Authorization: Basic dG1vdmlsZXNcbWFyYmVsOkFyYWNhbGFjYW5hMjM=
-> User-Agent: curl/7.19.7 Rcurl/1.95.4.1 httr/0.4.0.99
-> Host: google.com
-> Accept: */*
-> Accept-Encoding: gzip
-> Proxy-Connection: Keep-Alive
->
<- HTTP/1.1 407 Proxy Authentication Required
<- Server: pxsip02-srv.xxxxx.com.ar
<- Date: Mon, 11 Aug 2014 15:11:14 GMT
<- Content-Length: 309
<- Content-Type: text/html
<- Connection: Keep-Alive
<- Keep-Alive: timeout=60, max=8
<- Proxy-Authenticate: NTLM
<-
content(a)
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Authentication Error</title></head>
<body>
<h1>Authentication Error</h1>There has been an error validating your user credentials. If the error persists,contact your network administrator.<br>Proxy authentication required<br><hr>
<br>Details: 407 Proxy Authentication Required</body>
</html>
### RCurl attempt
library("RCurl")
opts <- list(
proxy = "proxy.xxxxx.com.ar",
proxyusername = "xxxxxx\\xxxxx",
proxypassword = "xxxxxx",
proxyport = 8080,
capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"),
verbose=TRUE, proxyauth=TRUE, useragent= "", header = TRUE
)
options( RCurlOptions = opts)
getURL("http://stackoverflow.com")
* About to connect() to proxy proxy.xxxxx.com.ar port 8080 (#0)
* Trying 10.167.195.11... * connected
* Connected to proxy.xxxxxx.com.ar (10.167.195.11) port 8080 (#0)
* Proxy auth using Basic with user 'xxxxxxx\xxxxx'
> GET http://stackoverflow.com HTTP/1.1
Proxy-Authorization: Basic VE1PVklMRVNcTUFSQkVMOkFyYWNhbGFjYW5hMjM=
Host: stackoverflow.com
Accept: */*
Proxy-Connection: Keep-Alive
[1] "HTTP/1.1 407 Proxy Authentication Required\r\nServer: pxsip02-srv.xxxx.com.ar\r\nDate: Mon, 11 Aug 2014 15:15:29 GMT\r\nContent-Length: 309\r\nContent-Type: text/html\r\nConnection: Keep-Alive\r\nKeep-Alive: timeout=60, max=8\r\nProxy-Authenticate: NTLM\r\n\r\n<html><head><title>Authentication Error</title></head><body><h1>Authentication Error</h1>There has been an error validating your user credentials. If the error persists,contact your network administrator.<br/>Proxy authentication required<br/><hr/><br/>Details: 407 Proxy Authentication Required</body></html>"
< HTTP/1.1 407 Proxy Authentication Required
< Server: pxsip02-srv.xxxxxxx.com.ar
< Date: Mon, 11 Aug 2014 15:15:29 GMT
< Content-Length: 309
< Content-Type: text/html
< Connection: Keep-Alive
< Keep-Alive: timeout=60, max=8
< Proxy-Authenticate: NTLM
<
* Connection #0 to host proxy.xxxxxx.com.ar left intact
答案 0 :(得分:0)
以下是上一个问题的更新。我将其添加为另一个答案,因此更容易理解。
GET("http://google.com/",
config = list(
use_proxy(url="proxy.xxx.com.ar", port=8080,
username = "xxxx\\xxxx", password = "xxxxx",
proxyauth = 1)
)
)
错误消息:
Error in use_proxy(url = "proxy.xxxx.com.ar", port = 8080, username = "xxxxxx\\xxxx", :
unused argument (proxyauth = 1)