大家好......我正在使用Python 2.76向.aspx网页提交查询并通过BeautifulSoup获取结果,并希望将它们存储到Excel电子表格中。
import mechanize
import re
import xlwt
from bs4 import BeautifulSoup
import urllib2
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('Legi', cell_overwrite_ok = True)
for items in ['university student', 'high school student']:
url = r'http://legistar.council.nyc.gov/Legislation.aspx'
request = mechanize.Request(url)
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
response.close()
form['ctl00$ContentPlaceHolder1$txtSearch'] = items
submit_page = mechanize.urlopen(form.click())
soup = BeautifulSoup(submit_page.read())
aa = soup.find_all(href=re.compile('LegislationDetail'))
for bb in aa:
cc = bb.text
#print cc
results = []
results.append(cc)
for row, legi_no in enumerate(results):
sheet.write (row, 0, legi_no)
book.save("C:\\legi results.xls")
如果我打印变量'cc',它会找到并获取结果。但是,写入Excel电子表格不成功,因为它只写第一个单元格。
任何帮助将不胜感激。感谢。
答案 0 :(得分:1)
您在results
循环内创建for bb in aa
变量。
这意味着results
会针对[]
中的每个值初始化为aa
,最后结果将只包含一个元素(最后一个),这是不可取的。
将results
放在外面,它应该可以正常工作,如下所示。
import mechanize
import re
import xlwt
from bs4 import BeautifulSoup
import urllib2
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('Legi', cell_overwrite_ok = True)
for items in ['university student', 'high school student']:
url = r'http://legistar.council.nyc.gov/Legislation.aspx'
request = mechanize.Request(url)
response = mechanize.urlopen(request)
forms = mechanize.ParseResponse(response, backwards_compat=False)
form = forms[0]
response.close()
form['ctl00$ContentPlaceHolder1$txtSearch'] = items
submit_page = mechanize.urlopen(form.click())
soup = BeautifulSoup(submit_page.read())
aa = soup.find_all(href=re.compile('LegislationDetail'))
results = [] # Initialize results here !!!
for bb in aa:
cc = bb.text
#print cc
results.append(cc)
for row, legi_no in enumerate(results):
sheet.write (row, 0, legi_no)
book.save("C:\\legi results.xls")