Question

因此，我可以获得PDF链接EXAMPLE OF THE LINK HERE网页的内容，但我不想要网页的内容，我想要PDF的内容，所以我可以将内容放在我的计算机上的PDF文件夹中。

我已经成功地在我不需要登录且没有代理服务器的网站上这样做。

相关代码：

import os
import urllib2
import time
import requests
import urllib3
from random import *


s = requests.Session()
data = {"Username":"username", "Password":"password"}
url = "https://login.url.com"

print "doing things"
r2 = s.post(url, data=data, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)

#I get a response 200 from printing r2
print r2


downlaod_url = "http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM"

file = open("F:\my_filepath\document" + str(maxCounter) + ".pdf", 'wb')
temp = s.get(download_url, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)

#This prints out the response from the proxy server (i.e. 200)
print temp

something = uniform(5,6)
print something
time.sleep(something)

#This gets me the content of the web page, not the content of the PDF
print temp.content

file.write(temp.content)
file.close()

我需要帮助找出如何下载＆＃34; PDF的内容

Answer 1

试试这个：

import requests

url = 'http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM'

pdf = requests.get(url)
with open('walmart.pdf', 'wb') as file:
    file.write(pdf.content)

修改

再次尝试使用请求会话来管理cookie（假设他们在登录后发送给你们），也可能是另一个代理人

proxy_dict = {'https': 'ip:port'} with requests.Session() as session: # Authentication request, use GET/POST whatever is needed # data variable should hold user/password information auth = session.get(login_url, data=data, proxies=proxy_dict, verify=False) if auth.status_code == 200: print(auth.cookies) # Tell me if you got anything pdf = auth.get('download_url') # Were continuing the same session with open('walmart.pdf', 'wb') as file: file.write(pdf.content) else: print('No go, got {0} response'.format(auth.status_code))

需要下载PDF，而不是网页的内容

1 个答案: