Question

我是linux curl和python的新手我试图抓取网站内容，但出于某些原因我得到了＆＃34; 401未经授权＆＃34;错误。我可以在浏览器中打开它，但我无法从python中打开/抓取其内容或curl.AM我做错了什么？我检查了我的网址，用户名和密码都是正确的，但我不明白这个问题。请帮帮我

Python 2.7.8
curl 7.37.1

ubuntu@ubuntu:~/pythonSample$ curl -v -u kumar 'https://abc.def.co.in'
Enter host password for user 'kumar':
* Rebuilt URL to: https://abc.def.co.in/
* Hostname was NOT found in DNS cache
*   Trying xxx.xxx.xxx.xx...
* Connected to abc.def.co.in (xxx.xxx.xxx.xx) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using TLSv1.2 /
* Server certificate:
*    subject: OU=Domain Control Validated; CN=shareapp.def.co.in
*    start date: 2016-01-15 14:07:38 GMT
*    expire date: 2017-01-15 13:52:39 GMT
*    subjectAltName: abc.def.co.in matched
*    issuer: C=; ST=; L=; O=; OU=; CN=Go Daddy Secure Certificate Authority - G2
*    SSL certificate verify ok.
* Server auth using Basic with user 'kumar'
> GET / HTTP/1.1
> Authorization: Basic a3VtYXI6UmFodWwxMjM=
> User-Agent: curl/7.37.1
> Host: abc.def.co.in
> Accept: */*
> 
< HTTP/1.1 401 Unauthorized
< Cache-Control: private
< Content-Length: 16
< Content-Type: text/plain; charset=utf-8
< SPRequestGuid: 
< request-id: 
< X-FRAME-OPTIONS: SAMEORIGIN
< SPRequestDuration: 5
< SPIisLatency: 9
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
< WWW-Authenticate: NTLM
< WWW-Authenticate: Negotiate
< X-Content-Type-Options: nosniff
< X-MS-InvokeApp: 1; RequireReadOnly
< MicrosoftSharePointTeamServices: xx.x.x.xxx
< Date: Mon, 25 Apr 2016 19:10:11 GMT
< 
* Connection #0 to host abc.def.co.in left intact
401 UNAUTHORIZED
ubuntu@ubuntu:~/pythonSample$ 

docScraper.py 

import requests
from BeautifulSoup import BeautifulSoup

username = 'kumar@def.co.in'
password = 'Rahul123'
url = 'https://abc.def.co.in/'
r = requests.get(url, auth=(username, password))  
page = r.content
print page

ubuntu@ubuntu:~/pythonSample$ python docScraper.py 
401 UNAUTHORIZED
ubuntu@ubuntu:~/pythonSample$

401使用curl或python抓取网站时出现未经授权的错误

0 个答案: