Python机械化文件下载

时间:2013-12-05 22:30:10

标签: python submit download urllib2 mechanize

我正在尝试从网站下载excel文件。 我使用mechanize成功填写表单,提交表单应该返回给我一个文件下载。 但是在下载时,它会返回html而不是文件的实际内容。

import mechanize
br = mechanize.Browser()
br.open("http://web.sba.gov/pro-net/search/dsp_dsbs.cfm")
br.select_form('SearchForm')
br["States"] = ["AL","AK"]
br["E8a"] = ["Y"]
br["Report"] = ["S"]

response = br.submit()
fileobj = open("szz.txt","wb")
fileobj.write(response.read())
fileobj.close()

结果看起来像

<!doctype html>
<html lang="en-US" dir="ltr">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<title>SBA - Dynamic Search</title>
<link     href="/gls/dsp_choosefunction.cfm" accesskey="1" rel="Home" title="Home (Return to GLS Choose Function)">
<link     rel="stylesheet" type="text/css" media="all" href="/library/css/jquery.mobile/sba.dtv.css?CachedAsOf=2012-06-20T22:15"/><!-- local code -->
<link     rel="stylesheet" type="text/css" media="all" href="/library/css/sczz.strict.css?CachedAsOf=2013-09-20T18:55"/>
<script src="/library/javascripts/jquery/jquery.js?CachedAsOf=2012-09-21T15:37"></script><!-- 1.8.2 -->
<script src="/library/javascripts/jquery/jquery.mobile/sba.jqm.js?CachedAsOf=2013-03-28T16:11"></script><!-- local code -->
<noscript>
    <link rel="stylesheet" type="text/css" media="all" href="/library/css/sczz.noscript.css?CachedAsOf=2010-10-14T19:23"/>
</noscript>
<script>
var gSlafDevTestProd                    = "Prod";
var gSlafDevTestProdInd                 = "2";
var gSlafInlineBlock                    = "inline-block";

1 个答案:

答案 0 :(得分:2)

我在你的代码中发现了一些错误,我尝试了以下代码并在浏览器中打开文件显示了一个很好的表,所以试试吧:

import mechanize
br = mechanize.Browser()
br.open("http://web.sba.gov/pro-net/search/dsp_dsbs.cfm")
br.select_form('SearchForm')
br.form["State"] = ["AL","AK"]
br.form["E8a"] = ["Y"]
br.form["Report"] = ["S"]

response = br.submit()
fileobj = open("szz.html","wb")
fileobj.write(response.read())
fileobj.close()

基本上你需要调用br.form[control_name]并且你在关键“状态”上有一个错误它只是“状态”,现在将文件保存为.html并在浏览器中打开它以查看是否这就是你要找的东西。