Question

我需要访问数千个URL，并从那里下载文件。

我尝试使用urllib和请求，它们似乎都成功完成，但是当我查看下载的文件时，总是相同，并带有错误消息。

我正在Windows 10计算机上运行python 2.7。

我尝试了以下操作，并在脚本完成后在计算机上获取文件test4.pdf。

import requests

dls = "http://wbdocsservices.xxOrgNamexx.org/services?I4_SERVICE=FILE_URLS2&amp;RENDITION=Y&amp;I4_DOCID=090224b0828cd94a&amp;stream=Yes"

response = requests.get(dls)
with open("test4.pdf", "wb") as local_file:
    local_file.write(response.content)

test4.pdf无法识别并且无法打开。将文件重命名为test4.txt时，便可以将其打开。该文件的内容是

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Securid Redirect Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="refresh" content="0;url=https://wbssocert.xxOrgNamexx.org/fed/secure/crxredirect.jsp" />
</head>
<body />
</html>

此外，我看到当我单击要下载的链接（在Chrome或IE中）时，它会打开pdf文档，但地址字段显示-

http://wbescsprd3.xxOrgNamexx.org:9280/ACS/servlet/ACS?command=read&version=2.3&docbaseid=0224b0&basepath=%2Fwbpfiles26%2Fwbecmoksp%2Fdata%2Fwbecmoksp%2Fwbdocs_storage_25%2F000224b0&filepath=80%2F01%2F2e%2F62.pdf&objectid=090224b0828cd94a&cacheid=dggEAgA%3D%3DYi4BgA%3D%3D&format=pdftext&pagenum=0&signature=oWk4P5lnQ41G9e6L%2BeQIAmMrq%2BrkHDq5XsZDD2yZ7Gm60YkKuXWHhsbuuONqev4MFbGIB6C6GJiXefsK4RF8i7tBbfczFvDSiJTgBHB2YPZ0es%2BU%2BGCxeMmmYf67pI2mC36CawfqkifjTfE4otqfu%2BSY2TGxmQ1uIZrZOnQo4Is%3D&servername=Awbescsprd3_wbecmoksp&mode=1&timestamp=1562058840&length=9397463&mime_type=application%2Fpdf&parallel_streaming=true&encryption_mode=require&expire_delta=360

我使用urllib并获得了与上述完全相同的结果。关于如何下载文件的任何帮助或指示。

一吨

使用python从重定向的链接下载文件

0 个答案: