这是我的第一个问题,所以如果我以任何方式做错了,请多好。
我正在使用python 3.3中的requests模块来自动从几个站点下载文件,但是当我尝试获取csv文件时,this one特别给我带来了麻烦。我在python方面有一定的可操作性,但就网站交互而言,我不熟悉html和javascript。
以下是相关代码。
import requests
import datetime
now = datetime.datetime.now().strftime("%Y%m%d")
folder = 'some path'
url = 'https://gats.pjm-eis.com/gats2/PublicReports/RenewableGeneratorsRegisteredInGATS/'#ExportTo'
payload = {'exportType' : 'CSV',
'tabNumber' : ''}
doc = requests.post(url, data=payload, stream=True)
output = open(folder+now+'_GATSRegistered.csv','wb')
output.write(doc.content)
output.close()
我没有收到任何错误,但我创建的文档基于错误页面。我已成功地为url直接指向文件('http://www.place.com/path/file.xlsx
)的站点执行此操作,因此我知道一旦检索到该文件该如何处理。但这只需要一个“获取”请求。
所以,我的问题:
答案 0 :(得分:1)
我查看了Chrome中的页面,打开了开发人员控制台,打开了网络标签页。在那里,您可以看到点击“CSV”按钮会发送包含大量表单数据的POST
请求。
exportType:CSV
tabNumber:
CSV_CH:1
PRN_CH:0
GridView$DXFREditorcol0:
GridView$DXFREditorcol1:
GridView$DXFREditorcol2:
GridView$DXFREditorcol3:
GridView$DXFREditorcol4:
GridView$DXFREditorcol5:
GridView$DXFREditorcol6:
GridView$DXFREditorcol7:
GridView$DXFREditorcol8:
GridView$DXFREditorcol9:
GridView$DXFREditorcol10:
GridView$DXFREditorcol11:
GridView$DXFREditorcol12:
GridView$DXFREditorcol13:
GridView$DXFREditorcol14:
GridView$DXFREditorcol15:
GridView$DXFREditorcol16:
GridView$DXFREditorcol17:
GridView$DXFREditorcol18:
GridView$DXFREditorcol19:
GridView$DXFREditorcol20:
GridView$DXFREditorcol21:
GridView$DXFREditorcol22:
GridView$DXFREditorcol23:
GridView$DXFREditorcol24:
GridView$DXFREditorcol25:
GridView$DXFREditorcol26:
GridView_custwindowWS:0:0:-1:-10000:-10000:0:1px:-10000:1:0:0:0
GridView_DXHFPWS:0:0:-1:-10000:-10000:0:180px:100px:1:0:0:0
GridView_DXPagerBottom_PSPSI:2
GridView$DXSelInput:
GridView$DXKVInput:[]
GridView$CallbackState:BwMHAQIFU3RhdGUGEAEHGwcAAgEHAQIBBwICAQcDAgEHBAIBBwUCAQcGAgEHBwIBBwgCAQcJAgEHCgIBBwsCAQcMAgEHDQIBBw4CAQcPAgEHEAIBBxECAQcSAgEHEwIBBxQCAQcVAgEHFgIBBxcCAQcYAgEHGQIBBxoCAQcABxsHAAcABwEHAAcCBwAHAwcABwQHAAcFBwAHBgcABwcHAAcIBwAHCQcABwoHAAcLBwAHDAcABw0HAAcOBwAHDwcABxAHAAcRBwAHEgcABxMHAAcUBwAHFQcABxYHAAcXBwAHGAcABxkHAAcaBwAHAAcAAgAFAAAAgAkCCUVudGl0eUtleQkCAAIAAwcEAgAHAAIBBTaVAAAHAAIBBwAHAAIQRmlsdGVyRXhwcmVzc2lvbgcCAAIIUGFnZVNpemUDBzI=
GridView$DXSyncInput:
GridView_DXFilterRowMenuCI:
DXScript:1_142,1_80,1_135,1_91,14_0,1_90,1_113,14_23,14_10,1_98,1_105,1_77,1_128,1_126,1_124,1_133,1_119,1_127,1_104,1_101,1_84,1_109,1_92,14_1,1_94,1_97,1_95,1_96,1_106,14_4,1_100,1_117,1_103,14_12,14_13,1_102,1_129,1_107,1_137,1_114,14_16,10_2,10_1,10_3,10_4,14_3
DXMVCEditorsValues:{"GridView_DXFREditorcol0":null,"GridView_DXFREditorcol1":null,"GridView_DXFREditorcol2":null,"GridView_DXFREditorcol3":null,"GridView_DXFREditorcol4":null,"GridView_DXFREditorcol5":null,"GridView_DXFREditorcol6":null,"GridView_DXFREditorcol7":null,"GridView_DXFREditorcol8":null,"GridView_DXFREditorcol9":null,"GridView_DXFREditorcol10":null,"GridView_DXFREditorcol11":null,"GridView_DXFREditorcol12":null,"GridView_DXFREditorcol13":null,"GridView_DXFREditorcol14":null,"GridView_DXFREditorcol15":null,"GridView_DXFREditorcol16":null,"GridView_DXFREditorcol17":null,"GridView_DXFREditorcol18":null,"GridView_DXFREditorcol19":null,"GridView_DXFREditorcol20":null,"GridView_DXFREditorcol21":null,"GridView_DXFREditorcol22":null,"GridView_DXFREditorcol23":null,"GridView_DXFREditorcol24":null,"GridView_DXFREditorcol25":null,"GridView_DXFREditorcol26":null}
您可以看到上述哪一项是您发送到服务器绝对必要的。我怀疑所有这些都是必需的(但我错了很多:))。
也就是说,在使用stream=True
时,您应该使用iter_content
。所以你的代码看起来像是:
payload = {
# Form contents
}
r = requests.post(url, data=payload, stream=True)
with open(filename, 'wb') as output:
for chunk in r.iter_content():
output.write(chunk)
for循环确保在可用时将其写入您的文件。当它停滞不前时,你不必担心它会挂在你身上。