Question

我写了一个scrapy蜘蛛来刮掉一些html标签。现在问题是这个蜘蛛完全适用于在互联网上运行的网址，但不适用于在localhost上运行的网址。我的意思是，蜘蛛会在本地计算机上为资源的url产生错误，即使url完全正确并且在运行站点的url时对同一资源也能正常工作。有人可以清除我的疑问吗？

    def parse(self, response):
    hxs = HtmlXPathSelector(response)
    con = MySQLdb.connect(host="localhost",
                          user = "username",
                          passwd="psswd",
                          db ="dbname")
    cur = con.cursor()
    title = hxs.select("//h3")[0].extract()
    desc = hxs.select("//h2").extract()
    a = hxs.select("//meta").extract()
    cur.execute("""Insert into heads(h2) Values(%s )""",(a))
    con.commit()
    con.close()

Answer 1

错误

exceptions.IndexError: list index out of range

在这一行

title = hxs.select("//h3")[0].extract()

表示列表hxs.select("//h3")为空（[]），因为尝试使用hxs.select("//h3")[0]访问第一项（索引0）时，Python使用的索引超出范围。

您正在解析的html显然没有<h3>个标签。

此外，在您修复上述错误后，您需要在a中的(a,)之后加上逗号：

cur.execute("""Insert into heads(h2) Values(%s )""",(a,))

(a)评估为a，而(a,)表示内部包含1个元素的元组。

python scrapy在localhost上是否正常工作？

1 个答案: