我试图用ScraperWiki抓取这个PDF。当前代码给出了一个错误,名称'data'未定义但我在
上收到错误elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text
如果我评论该行,我在我的else语句中得到相同的错误。
这是我的代码
import scraperwiki
import urllib2, lxml.etree
#Pull Mondays
url = 'http://www.city.pittsburgh.pa.us/police/blotter/blotter_monday.pdf'
pdfdata = urllib2.urlopen(url).read()
xmldata = scraperwiki.pdftoxml(pdfdata)
root = lxml.etree.fromstring(xmldata)
# how many pages in PDF
pages = list(root)
print "There are",len(pages),"pages"
# Test Scrape of only Page 1 of 29
for page in pages[0:1]:
for el in page:
if el.tag == "text":
if int(el.attrib['left']) < 11: data = { 'Report Name': el.text }
elif int(el.attrib['left']) < 317: data['Location of Occurrence'] = el.text
elif int(el.attrib['left']) < 169: data['Incident Time'] = el.text
elif int(el.attrib['left']) < 647: data['Neighborhood'] = el.text
elif int(el.attrib['left']) < 338: data['Description'] = el.text
else:
data['Zone'] = el.text
print data
我做错了什么?
此外,任何有关更好解决方案的建议都将受到赞赏。
答案 0 :(得分:1)
除非您跳过了部分代码,否则只会创建<{1}}词典如果此行中的条件匹配:
data
您在if int(el.attrib['left']) < 11: data = { 'Report Name': el.text }
中设置值的所有其他行都取决于它已存在,因此如果第一个条件不匹配,您将获得data
。
快速解决方法是始终创建一个空数据字典,例如
NameError
等