我有一个Python脚本来使用请求获取页面。我需要使用代理来访问页面。当我访问一个http页面时,它会通过代理,但是当我访问https页面时,它不会通过代理(我使用日志来检查这个,如下所述)。我已经检查过代理服务提供商(proxymesh),他们说他们的代理也可以用于https页面。访问https网站和http网站时,我需要在脚本中更改哪些内容?
我的代码如下所示。在这个问题的最后,我已经包含了为http和https网站生成的日志文件,这些日志文件显示代理用于http但不用于https。
任何想法都会非常有用。
import logging
import requests
#set up logging
logging.getLogger('').handlers = []
logging.basicConfig(
filename = "mylog_with_proxy.log", #in my code, the full path is specified
filemode="w",
level = logging.DEBUG)
#specify proxies and headers
proxies = {'http': 'http://fr.proxymesh.com:31280', 'https': 'http://fr.proxymesh.com:31280'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393',}
#the two URLs that I accessed. One is for an http site and the other one is for an https site. These sites are just examples of sites I need to access.
http_url = "http://docs.python-requests.org/en/master/user/quickstart/"
https_url = "https://www.haskell.org/happy/"
#get the page. I executed the script twice - once for http_url and the second time for https_url. Here, it shows http_url
r = requests.get(http_url, headers=headers, proxies=proxies, timeout=5)
r.raise_for_status()
日志文件如下所示:
访问http站点时(即使用http_url运行脚本时):
INFO:requests.packages.urllib3.connectionpool:启动新的HTTP连接(1):fr.proxymesh.com
DEBUG:requests.packages.urllib3.connectionpool:“GET http://docs.python-requests.org/en/master/user/quickstart/ HTTP / 1.1”200无
访问https站点时(即使用https_url运行脚本时)
INFO:requests.packages.urllib3.connectionpool:启动新的HTTPS连接(1):www.haskell.org
DEBUG:requests.packages.urllib3.connectionpool:“GET / happy / HTTP / 1.1”200无