我是linux curl和python的新手我试图抓取网站内容,但出于某些原因我得到了 " 401未经授权"错误。我可以在浏览器中打开它,但我无法从python中打开/抓取其内容 或curl.AM我做错了什么? 我检查了我的网址,用户名和密码都是正确的,但我不明白这个问题。 请帮帮我
Python 2.7.8
curl 7.37.1
ubuntu@ubuntu:~/pythonSample$ curl -v -u kumar 'https://abc.def.co.in'
Enter host password for user 'kumar':
* Rebuilt URL to: https://abc.def.co.in/
* Hostname was NOT found in DNS cache
* Trying xxx.xxx.xxx.xx...
* Connected to abc.def.co.in (xxx.xxx.xxx.xx) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using TLSv1.2 /
* Server certificate:
* subject: OU=Domain Control Validated; CN=shareapp.def.co.in
* start date: 2016-01-15 14:07:38 GMT
* expire date: 2017-01-15 13:52:39 GMT
* subjectAltName: abc.def.co.in matched
* issuer: C=; ST=; L=; O=; OU=; CN=Go Daddy Secure Certificate Authority - G2
* SSL certificate verify ok.
* Server auth using Basic with user 'kumar'
> GET / HTTP/1.1
> Authorization: Basic a3VtYXI6UmFodWwxMjM=
> User-Agent: curl/7.37.1
> Host: abc.def.co.in
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< Cache-Control: private
< Content-Length: 16
< Content-Type: text/plain; charset=utf-8
< SPRequestGuid:
< request-id:
< X-FRAME-OPTIONS: SAMEORIGIN
< SPRequestDuration: 5
< SPIisLatency: 9
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
< WWW-Authenticate: NTLM
< WWW-Authenticate: Negotiate
< X-Content-Type-Options: nosniff
< X-MS-InvokeApp: 1; RequireReadOnly
< MicrosoftSharePointTeamServices: xx.x.x.xxx
< Date: Mon, 25 Apr 2016 19:10:11 GMT
<
* Connection #0 to host abc.def.co.in left intact
401 UNAUTHORIZED
ubuntu@ubuntu:~/pythonSample$
docScraper.py
import requests
from BeautifulSoup import BeautifulSoup
username = 'kumar@def.co.in'
password = 'Rahul123'
url = 'https://abc.def.co.in/'
r = requests.get(url, auth=(username, password))
page = r.content
print page
ubuntu@ubuntu:~/pythonSample$ python docScraper.py
401 UNAUTHORIZED
ubuntu@ubuntu:~/pythonSample$