我目前正在尝试从http://www.spotrac.com/获取需要登录的数据。我当前的尝试使用此代码(我通过对类似主题的一堆其他堆栈溢出问题获得)
from bs4 import BeautifulSoup as bs
from requests import session
payload = {
'id': 'contactForm',
'cmd': 'http://www.spotrac.com/signin/submit/',
'email': '*****',
'password': '*****'
}
with session() as c:
r_login = c.post('http://www.spotrac.com/signin/', data=payload)
print(r_login.headers)
response = c.get('http://www.spotrac.com/nba/cleveland-cavaliers/lebron-james')
print(response.cookies)
soup=bs(response.text, 'html.parser')
with open('ex.html','w') as f:
f.write(soup.prettify())
我当前的代码做的一切都正确,除了我在发出请求时没有登录。
由于
答案 0 :(得分:1)
您正在向错误的网址发送POST请求,并且还有不正确的有效负载。
POST http://www.spotrac.com/signin/submit/ HTTP/1.1
Host: www.spotrac.com
Connection: keep-alive
Content-Length: 86
Cache-Control: max-age=0
Origin: http://www.spotrac.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://www.spotrac.com/signin/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2206021e191bdbbaf955f111f67b961056%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485487245%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dd6089620b21ecce6837161605055ae04; _ga=GA1.2.910256341.1481865346; _gali=contactForm
redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad
HTTP/1.1 302 Found
Server: nginx
Date: Fri, 27 Jan 2017 04:21:16 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22badb1275aee1cdad6736a6b4bb1ce809%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485490876%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dad486866c32cac526487707cea85b8a9; expires=Fri, 10-Feb-2017 04:21:16 GMT; path=/
Location: http://www.spotrac.com/register/
X-Powered-By: PleskLin
MS-Author-Via: DAV
从上面的会话中可以看出,正确的网址应为http://www.spotrac.com/signin/submit/
,有效负载字符串为redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad
,基本上是:
payload = {'redirect': 'http://www.spotrac.com/',
'email': mail_address,
'password': password}
另外,请确保使用正确的参数模拟headers
,然后您就可以了。