Question

我是python的新手（事实上这是我的第一个Python项目），我在编写这个web scraper时遇到了一些麻烦。我使用tutorial来解决这个问题，但代码没有产生任何结果。我真的很感激一些帮助。

from lxml import html
import requests

page = requests.get('http://openbook.sfgov.org/openbooks/cgi-bin/cognosisapi.dll?b_action=cognosViewer&ui.action=run&ui.object=/content/folder%5B%40name%3D%27Reports%27%5D/report%5B%40name%3D%27Budget%27%5D&ui.name=20Budget&run.outputFormat=&run.prompt=false')
tree = html.fromstring(page.content)

#This will find the table headers:
categories = tree.xpath('//*[@id="rt_NS_"]/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[2]/td/div/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td[1]')
# This will find the budgets
category_budget = tree.xpath('//*[@id="rt_NS_"]/tbody/tr[2]/td/table/tbody/tr[4]/td/table/tbody/tr[2]/td/div/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td[2]/span[1]')

print 'Cateogries: ', categories
print 'Budget: ', category_budget

Answer 1

看起来JavaScript正在生成table id="rt_NS_"的内容。

在这种情况下，requests不会帮助你。

page = requests.get('http://openbook.sfgov.org/openbooks/cgi-bin/cognosisapi.dll?b_action=cognosViewer&ui.action=run&ui.object=/content/folder%5B%40name%3D%27Reports%27%5D/report%5B%40name%3D%27Budget%27%5D&ui.name=20Budget&run.outputFormat=&run.prompt=false')

ctx = page.content
if "id=\"rt_NS_\"" in ctx:
    print "Found!"
else:
    print "Not Found!"

Not Found!

您需要使用其他方法。 Selenium with python可能是一种选择。

刮刀没有结果

1 个答案: