Question

我按照scrapy页面上的教程，我试着编辑代码在维基百科上练习。当我这样做时，它输出页面中的文本，但它会这样做数百次。 JSON文件和控制台包含反复打印的相同内容。我认为这可能与功能有关？另外，sel.xpath和site.xpath之间有什么区别？

谢谢！

以下是代码：

from scrapy.spider import Spider
from scrapy.selector import Selector

from tutorial.items import DmozItem

class DmozSpider(Spider):
   name = "dmoz"
   allowed_domains = ["wikipedia.com"]
   start_urls = [
       "http://en.wikipedia.org/wiki/Caesar_Hull"
   ]

   def parse(self, response):
       sel = Selector(response)
       sites = sel.xpath('//div')
       items =[]
       for site in sites:
            item = DmozItem()
            item['title'] = sel.xpath('.//p/text()').extract()
            items.append(item)
       return items

Answer 1

如果您希望第二个xpath相对于第一个xpath而不是：

item['title'] = sel.xpath('.//p/text()').extract()

做的：

item['title'] = site.xpath('.//p/text()').extract()

循环//div创建与文档中找到的div一样多的div。

正在运行sel.xpath('.//p/text()')与运行sel.xpath('//p/text()')相同，因此一遍又一遍地获得相同的结果

Scrapy输出相同的数百次

1 个答案: