感兴趣的网址是:
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/PTO/search-adv.htm&r=10&f=G&l=50&d=PTXT&OS=AN/(nortel)&RS=AN/nortel&Query=AN/(nortel)&Srch1=nortel.ASNM.&NextList1=Next 50 Hits
测试其存在的所选函数是:
> url.exists("http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/PTO/search-adv.htm&r=10&f=G&l=50&d=PTXT&OS=AN/(nortel)&RS=AN/nortel&Query=AN/(nortel)&Srch1=nortel.ASNM.&NextList1=Next 50 Hits")
[1] FALSE
为什么不工作?该网址明确存在并以Chrome格式解析,并且使用网址上的htmlTreeParse
工作正常。
答案 0 :(得分:5)
我的猜测是url.exists
正在使用HTTP HEAD请求,服务器似乎无法处理:
$ telnet patft.uspto.gov 80
Trying 151.207.240.26...
Connected to patft.uspto.gov.
Escape character is '^]'.
HEAD /netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/PTO/search-adv.htm&r=10&f=G&l=50&d=PTXT&OS=AN/(nortel)&RS=AN/nortel&Query=AN/(nortel)&Srch1=nortel.ASNM.&NextList1=Next+50+Hits HTTP/1.1
Host: patft.uspto.gov
Connection: close
Connection closed by foreign host.
所以服务器坏了,而不是RCurl。