Question

根据this blog post about "Logging in With Requests"上的非常好的解释以及来自this answer to a question on SO about 'How to “log in” to a website using Python's Requests module?'的代码段，我有以下代码（*），以便进入并浏览具有身份验证的网站：

import requests, lxml.html

logurl = 'http://www.somesite.fr/subsite/'
url2 = 'http://www.somesite.fr/subsite/anotherpath/1135'

with requests.session() as s:
    login = s.get(logurl)
    login_html = lxml.html.fromstring(login.text)
    hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
    form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
    form['email'] = 'myemail'
    form['password'] = 'mypassword'

    response = s.post(logurl, data=form)

    r2 = s.get(url2)

如果我print form：

{'form_action': 'connexion', 
 'CSRFGuard_token': '762bd944c74e4194db5248279a80bc3eba8e417f0439af2701364e39c0e4b67376c0afc19ba05f2b8fd98ce3b14ac9625d59827b19f2134b4da98c43bef2b57a', 
 'password': 'mypassword', 
 'email': 'myemail'}

使用r2 = s.get(url2)，我想在验证后浏览此网站。 url2是我在手动＆＃34;手动＆＃34;时获得的网址。登录logurl后，这两个页面的html（和外观）差别很大。但是，如果我print response.text和r2.text，我会得到完全相同的html代码，即登录页面之一。我得出结论，登录不成功，或者会话没有保持这种状态......

我做错了什么？谢谢！

修改

运行Brian M. Sheldon建议的代码：

import logging
import requests

# enable debug logging with basic logging config
logging.basicConfig(level=logging.DEBUG)

with requests.session() as s:
    s.headers['user-agent'] = 'myapp'  # use non-default user-agent
    response = s.post(logurl, data={'email': 'myemail', 'password': 'mypassword'})
    print response.headers

DEBUG：requests.packages.urllib3.connectionpool：启动新的HTTP连接（1）：www.somesite.fr

DEBUG：requests.packages.urllib3.connectionpool：http://www.somesite.fr:80   ＆＃34; POST / subsite / HTTP / 1.1＆＃34; 200 1415

和response.headers是：

{＆＃39; Content-Length＆＃39;：＆＃39; 1415＆＃39;，＆＃39; Content-Encoding＆＃39;：＆＃39; gzip＆＃39;，＆＃39; Set-曲奇＆＃39 ;: ＆＃39; PHPSESSID = 741q7fj6pnkdl1ho4pr6s35cl1; path = /＆＃39;，＆＃39; Expires＆＃39;：＆＃39; Thu，19 1981年11月08:52:00 GMT＆＃39;，＆＃39; Vary＆＃39;：＆＃39; Accept-Encoding，Origin＆＃39;，＆＃39; Keep-Alive＆＃39;：＆＃39; timeout = 5，max = 100＆＃39;，＆＃39; Server＆＃39;：＆＃39; Apache＆＃39;，＆＃39; Connection＆＃ 39 ;: ＆＃39; Keep-Alive＆＃39;，＆＃39; Pragma＆＃39;：＆＃39; no-cache＆＃39;，＆＃39; Cache-Control＆＃39;：＆＃39; no-store， no-cache，must-revalidate，post-check = 0，pre-check = 0＆＃39;，＆＃39; Date＆＃39;：＆＃39; Tue， 2017年4月25日14:57:52 GMT＆＃39;，＆＃39; Content-Type＆＃39;：＆＃39; text / html;字符集= UTF-8＆＃39;}

s.cookies是：

RequestsCookieJar [Cookie PHPSESSID = t9t9gvt7enp70v5mb2viebr8v0 for www.somsite.fr/]＆gt;

和s.get(url2)给出：

DEBUG：requests.packages.urllib3.connectionpool：http://www.somesite.fr:80 ＆＃34; GET / subsite / anotherpath / 1135 HTTP / 1.1＆＃34; 200 1378

了解我做错了什么有帮助吗？

PS：显然这个领域在过去几年里一直在快速发展，几年前的一些答案已经过时/被更好的选择所取代。从我的阅读中，我认为Requests是实现我想要的最好的，但也欢迎其他解决方案。如果我忘记了一些有用的信息，请告诉我，我会进行编辑。

（*）我很抱歉，但我的问题出在一个带有身份验证的网站上，我不能给出一个可重现的例子。

Answer 1

如果没有更多信息，则无法提供更具体的答案。我可能会检查的第一件事是在标头中返回身份验证。标题位于import logging import requests # enable debug logging with basic logging config logging.basicConfig(level=logging.DEBUG) logurl = 'http://www.somesite.fr/subsite/' with requests.session() as s: s.headers['user-agent'] = 'myapp' # use non-default user-agent response = s.post(logurl, data={'email': 'myemail', 'password': 'mypassword'}) print(response.headers)。第二个请求失败的原因是会话未提供所需的身份验证，因此会将您重定向到登录URL。如果启用调试日志记录，则可以查看是否正在重定向请求。此外，一些网站使用默认请求user-agent阻止请求，因此设置用户代理可能会有所帮助。整个lxml部分也可能是不必要的。请尝试以下方法以获取有关实际情况的更多详细信息，以便我们进一步提供帮助：

var result = string.Join(" ", xElement.Elements().Select(x => x.ToString()).ToArray());

使用Python的请求进行身份验证失败，即使在会话

1 个答案: