HTTP错误403:禁止添加用户代理

时间:2015-08-24 19:39:46

标签: python http web web-scraping http-status-code-403

一旦我添加了一个已知的用户代理,我通常能够解决403错误但我现在正在尝试登录,然后最终抓不到并且无法弄清楚如何绕过此错误

代码:

import urllib
import http.cookiejar

cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
authentication_url = 'https://www.linkedin.com/'
payload = {
    'session_key': 'email',
    'session_password': 'password'
}
data = urllib.parse.urlencode(payload)
binary_data = data.encode('UTF-8')
req = urllib.request.Request(authentication_url, binary_data)
resp = urllib.request.urlopen(req)
contents = resp.read()

回溯

    Traceback (most recent call last):
  File "C:/Python34/loginLinked.py", line 16, in <module>
    resp = urllib.request.urlopen(req)
  File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 469, in open
    response = meth(req, response)
  File "C:\Python34\lib\urllib\request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python34\lib\urllib\request.py", line 507, in error
    return self._call_chain(*args)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

1 个答案:

答案 0 :(得分:1)

请参阅我对这个问题的回答:

why isn't Requests not signing into a website correctly?

  

我应该首先说明你真的应该使用他们的API:   http://developer.linkedin.com/apis

     

使用这些参数在linkedin的首页上似乎没有任何POST登录?

     

这是您必须POST到的登录网址:   https://www.linkedin.com/uas/login-submit

     

请注意,这可能也不会起作用,因为您至少需要登录表单中的csrfToken参数。

     

您可能也需要loginCsrfParam,也可以从首页上的登录表单中获取。

     

这样的事可能有用。未经测试,您可能需要添加其他POST参数。

import requests
s = requests.session()

def get_csrf_tokens():
    url = "https://www.linkedin.com/"
    req = s.get(url).text

    csrf_token = req.split('name="csrfToken" value=')[1].split('" id="')[0]
    login_csrf_token = req.split('name="loginCsrfParam" value="')[1].split('" id="')[0]

    return csrf_token, login_csrf_token


def login(username, password):
    url = "https://www.linkedin.com/uas/login-submit"
    csrfToken, loginCsrfParam = get_csrf_tokens()

    data = {
        'session_key': username,
        'session_password': password,
        'csrfToken': csrfToken,
        'loginCsrfParam': loginCsrfParams
    }

    req = s.post(url, data=data)

login('username', 'password')