下载HTTP内容处理python

时间:2015-02-15 17:55:25

标签: python

我陷入困境,需要一些帮助。我似乎无法找到有关如何下载文件的任何参考。我一直在使用请求来查找文件名,但我不知道从哪里去。

'内容处理':'附件;文件名= MT0376_DealerPrice.zip'

Any help would be greatly appreciated 

import requests
import csv
import urllib
from urllib.request import urlretrieve
import sys, os, base64, datetime, hashlib, hmac, urllib
from time import gmtime, strftime
import xml.etree.ElementTree as ET
import math
import time
import zipfile


dealerCode = 'Code'
userName ='User'
password = 'Pass'
loginUrl = 'https://www.lemansnet.com/login'

payload = {'rememberMe': 'on', 
           'dealerCode': dealerCode, 
           'dm': 4, 
           'userName': userName, 
           'password': password}


r = requests.post(loginUrl, params=payload)


token = r.headers['loginToken']


with open ("parts.xml", "r") as myfile:
    requestBody = myfile.read().replace('\n', '')

serviceURL = 'https://www.lemansnet.com/pricing/2013/pos'
ContentType = 'Content-Type:text/xml'
ContentLength = len(requestBody)
loginToken = 'loginToken:' + token

httparray = {'content': requestBody}

xml = """<pricing>
<whoForDealer>
<dealerCode>MT0376</dealerCode>
</whoForDealer>
<rememberPreferences>1</rememberPreferences>
</pricing>"""

params = {'http': httparray}

requestHeaders = {'Content-Type': 'text/xml',
                  'Content-Length': ContentLength, 
                  'charset': 'utf-8',
                  'loginToken': token,
                  'Cache-Control': 'no-cache',
                  'Pragma': 'no-cache',
                  'Connection': 'keep-alive'}
files ={'file': ('parts.xml')}


t = requests.post(serviceURL, data=xml, headers=requestHeaders)

#print(r.url)
#print(r.headers)
#print(r.text)
#print(r.status_code)
#print(r.content)
#print(t.url)
#print(t.headers)
#print(t.text)
print(t.status_code)
print(t.content)
with open("MT0376_DealerPrice1.zip", "wb") as code:
    code.write(t.content)
zip = t.headers['Content-disposition']
#print(zip)

#localName = zip.split('filename=')[1]
#print(localName)



print('end')

这是我目前对上述代码的回应。

200
b'PK\x03\x04\x14\x00\x08\x08\x08\x00H\x8aOF\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x1b\x00\x00\x00PriceFile_system_errors.txt\xe3\xf5M\xac\xc8\xcc-\xcdUHI\xcc\xcc\xa9T\xc8\xc9\xcc\xcd,QH\xadHNMMIM\xd1\xe3\xc5+\x0b\x00PK\x07\x08\xcdO\x92K#\x00\x00\x00<\x00\x00\x00PK\x01\x02\x14\x00\x14\x00\x08\x08\x08\x00H\x8aOF\xcdO\x92K#\x00\x00\x00<\x00\x00\x00\x1b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00PriceFile_system_errors.txtPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00I\x00\x00\x00l\x00\x00\x00\x00\x00'
end

感谢大家的帮助,我想我已经设法解决了这个问题。我将上面的代码编辑为我认为正常工作的代码。不幸的是,我收集我的信息的网站只允许我每天下载3次文件,所以直到明天我才能真正测试它。

1 个答案:

答案 0 :(得分:-1)

python内置urllib.urlretrieve可以正常工作:

import urllib

output_filename = '/tmp/my_downloaded_file.zip'
url = 'http://www.blog.pythonlibrary.org/wp-content/uploads/2012/06/'
url += 'wxDbViewer.zip'
# download...
urllib.urlretrieve(url, output_filename)

在python 3中:

from urllib.request import urlretrieve
...
urlretrieve(url, output_filename)