下载加密的网页

时间:2017-06-04 01:31:24

标签: python string cookies

您好我已经研究了这个问题,但我找不到任何答案。我需要将网页的子目录下载到字符串进行搜索,我知道必须这样做,但唯一的问题是网站是加密的,需要登录才能访问目录。我知道我需要发送cookie来请求下载,但我不确定如何做到这一点。我正在编写python。随时可以要求更多信息。

1 个答案:

答案 0 :(得分:0)

import urllib
import urllib2
import cookielib
import time

# All your cookie related things are done by this.
cookie_jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)


#POST Parameters for login page.
request_body_params = {'your_parameter_name': 'its_value', 'another_parameter_name': 'its_value'}


data_encoding = urllib.urlencode(request_body_params)
url_main = 'https://your_site.com/login'

main_request = urllib2.Request(url_main, data_encoding)

#Any headers required goes here.
main_request.add_header('Accept-encoding', 'gzip')

# This is the response of login. You don't want to read this.
main_response = urllib2.urlopen(main_request)

# You want data from this link.
url_results = 'https://your_site.com/sub_directory'
results_response = urllib2.urlopen(url_results)

print results_response.read()

要检查POST参数,请从浏览器转到该站点,单击“查看源”,然后转到查看源中的“网络”。然后,当您在浏览器中登录时,将生成网络日志,单击该链接并查看其POST参数和标题。