我试图自动从this website恢复数据(我想要的是" BVBG.086.01 PriceReport")。使用firefox检查,我发现POST所针对的请求URL是" http://www.bmf.com.br/arquivos1/lum-download_ipn.asp",参数是:
hdnStatus: "ativo"
chkArquivoDownload_ativo "28"
txtDataDownload_ativo "09/02/2018"
imgSubmeter "Download"
txtDataDownload_externo_ativo [3]
0 "25/08/2017"
1 "25/08/2017"
2 "25/08/2017"
因此,如果我使用hurl.it来发出请求,则响应是正确的302重定向(指向所请求文件的FTP URL,类似于"位置:/ FTP / Temp / 10981738 / Download.ex _&#34)。 (Example of the request here)。
所以我尝试使用以下代码(使用python' s库和#34;请求"),我尝试了两个版本的request_body,试图将其放入"数据" post方法的参数)
request_url = "http://www.bmf.com.br/arquivos1/lum-download_ipn.asp"
request_headers = {
"Host": "www.bmf.com.br",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Referer": "http://www.bmf.com.br/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo",
"Content-Type": "application/x-www-form-urlencoded",
"Content-Length": "236",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
}
# request_body = "hdnStatus=ativo&chkArquivoDownload_ativo=28&txtDataDownload_ativo=09/02/2018&imgSubmeter=Download&txtDataDownload_externo_ativo=25/08/2017&txtDataDownload_externo_ativo=25/08/2017&txtDataDownload_externo_ativo=25/08/2017"
request_body = {
"hdnStatus" : "ativo",
"chkArquivoDownload_ativo": "28",
"txtDataDownload_ativo": "09/02/2018",
"imgSubmeter": "Download",
"txtDataDownload_externo_ativo": ["25/08/2017", "25/08/2017", "25/08/2017"]
}
result_query = post(request_url, request_body, headers=request_headers)
# result_query = post(request_url, data=request_body, headers=request_headers)
for red in result_query.history:
print(BeautifulSoup(red.content, "lxml"))
print()
print(result_query.url)
我得到的是以下回复:
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a href="lumarquivos_ipn.asp">here</a>.</body>
</html>
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a href="/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=">here</a>.</body>
</html>
<html><head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a href="/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo">here</a>.</body>
</html>
http://www.bmf.com.br/arquivos1/lum-arquivos_ipn.asp?idioma=pt-BR&status=ativo
而不是我想要的那个(哪个应该指向文件的位置)。我在这里做错了什么?