Question

我使用Scrapy编写一个蜘蛛，以便在dmoz.org上获取内容。

当我在Python shell中使用response.xpath进行检查时，我得到了我想要的东西，但是当我在cmd中运行这个蜘蛛时，我什么也得不到。我很困惑。

这是我的蜘蛛代码：

import scrapy
from kecheng3.items import Kecheng3Item

class DmozSpiderSpider(scrapy.Spider):
    name = "dmoz_spider"
    allowed_domains = ["dmoz.org"]
    start_urls = ["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"]

    def parse(self, response):
        for divm in response.xpath('//*[@id="site-list-content"]/div'):
            item = Kecheng3Item()
            item['title'] = divm.xpath('/div[3]/a/div/text()').extract()
            item['link'] = divm.xpath('/div[3]/a/@href').extract()
            item['desc'] = divm.xpath('/div[3]/div/text()').extract()
            yield item

screenshot 1

screenshot 2

Answer 1

item['title'] = divm.xpath('./div[3]/a/div/text()').extract()
item['link'] = divm.xpath('./div[3]/a/@href').extract()
item['desc'] = divm.xpath('./div[3]/div/text()').extract()

/表示root

./表示当前节点，在您的情况下是divm节点。

默认情况下，您可以这样做：

item['title'] = divm.xpath('div[3]/a/div/text()').extract()

Scrapy在Python shell和cmd.exe中有不同的结果

1 个答案: