将结果写入.xls(向网页提交2个查询,并将不同的结果存储到.xls中)

时间:2014-08-05 07:48:46

标签: python excel beautifulsoup xlwt

大家好......我正在使用Python 2.76向.aspx网页提交查询并通过BeautifulSoup获取结果,并希望将它们存储到Excel电子表格中。

import mechanize
import re
import xlwt
from bs4 import BeautifulSoup
import urllib2

book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('Legi', cell_overwrite_ok = True)

for items in ['university student', 'high school student']:

    url = r'http://legistar.council.nyc.gov/Legislation.aspx'
    request = mechanize.Request(url)
    response = mechanize.urlopen(request)
    forms = mechanize.ParseResponse(response, backwards_compat=False)
    form = forms[0]
    response.close()

    form['ctl00$ContentPlaceHolder1$txtSearch'] = items

    submit_page = mechanize.urlopen(form.click())
    soup = BeautifulSoup(submit_page.read())
    aa = soup.find_all(href=re.compile('LegislationDetail'))
    for bb in aa:
        cc = bb.text

        #print cc
        results = []
        results.append(cc)

    for row, legi_no in enumerate(results):
      sheet.write (row, 0, legi_no)

book.save("C:\\legi results.xls")

如果我打印变量'cc',它会找到并获取结果。但是,写入Excel电子表格不成功,因为它只写第一个单元格。

任何帮助将不胜感激。感谢。

1 个答案:

答案 0 :(得分:1)

您在results循环内创建for bb in aa变量。

这意味着results会针对[]中的每个值初始化为aa,最后结果将只包含一个元素(最后一个),这是不可取的。

results放在外面,它应该可以正常工作,如下所示。

import mechanize
import re
import xlwt
from bs4 import BeautifulSoup
import urllib2

book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('Legi', cell_overwrite_ok = True)

for items in ['university student', 'high school student']:

    url = r'http://legistar.council.nyc.gov/Legislation.aspx'
    request = mechanize.Request(url)
    response = mechanize.urlopen(request)
    forms = mechanize.ParseResponse(response, backwards_compat=False)
    form = forms[0]
    response.close()

    form['ctl00$ContentPlaceHolder1$txtSearch'] = items

    submit_page = mechanize.urlopen(form.click())
    soup = BeautifulSoup(submit_page.read())
    aa = soup.find_all(href=re.compile('LegislationDetail'))

    results = [] # Initialize results here !!!
    for bb in aa:
        cc = bb.text

        #print cc
        results.append(cc)

    for row, legi_no in enumerate(results):
      sheet.write (row, 0, legi_no)

book.save("C:\\legi results.xls")