Question

我尝试在python 3.6中使用pycurl获取https://url的内容。
设置：我通过具有NTLM身份验证的https代理，然后在https://url进行身份验证，然后想要获取页面的内容。
最初，我使用了请求模块，在该模块中，由于代理（没有代理，一切正常），我遇到了问题。 Tunnel through https proxy with ntlm authentification and authentificate at https url afterwards
我能够获得https://example.org的内容，但是却无法获得https://www.google.de的内容以及我真正感兴趣的URL。两种情况下的错误消息都相同，并提示编码（我将在代码后写出）
代码：

import pycurl
from io import BytesIO

url = 'https://targeturl'
#url_example = 'https://example.org'
#url_google = 'https://www.google.de'

url_user = 'user'
url_pwd = 'url_examplepwd'

proxy_ip = 'ip'
proxy_port = port
proxy_user = 'domain\\user'
proxy_pwd = 'proxy_examplepwd'

buffer = BytesIO()
conn = pycurl.Curl()
conn.setopt(pycurl.VERBOSE, True)

#setting up proxy
conn.setopt(pycurl.PROXY, proxy_ip)
conn.setopt(pycurl.PROXYPORT, proxy_port)
conn.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_HTTP)
conn.setopt(pycurl.PROXYAUTH, pycurl.HTTPAUTH_NTLM)
conn.setopt(pycurl.PROXYUSERNAME, proxy_user)
conn.setopt(pycurl.PROXYPASSWORD, proxy_pwd)

#setting up url
#======this does not work==========
conn.setopt(pycurl.URL, url)
conn.setopt(pycurl.HTTPAUTH, pycurl.HTTPAUTH_BASIC)
conn.setopt(pycurl.USERPWD, url_user + ':' + url_pwd)

#======this also does not work====
#conn.setopt(pycurl.URL, url_google)

#=====this works=======
#conn.setopt(pycurl.URL, url_example)



conn.setopt(conn.WRITEDATA, buffer)

conn.perform()
conn.close()

content = buffer.getvalue()

错误消息：UnicodeDecodeError：'utf-8'编解码器无法解码位置122的字节0xfc：无效的起始字节

url和url_google的错误消息相同，这很有趣。

我是否可以在pycurl，请求或任何其他模块中解决此问题都没有关系。但是目标是通过代理同时获取页面的内容。

通过ntlm-proxy隧道传输时的pycurl编码

0 个答案: