Question

我正在尝试从gelbeseiten.de（德国的黄页）中删除数据

# -*- coding: utf-8 -*-
import scrapy
  from scrapy.spiders import CrawlSpider
  from scrapy.http import Request
  from scrapy.selector import Selector
  from scrapy.http import HtmlResponse


class GelbeseitenSpider(scrapy.Spider):
  name = "gelbeseiten"
  allowed_domains = ["http://www.gelbeseiten.de"]
  start_urls = ['http://www.gelbeseiten.de/zoohandlungen/s1/alphabetisch']

  def parse(self, response):
    for adress in response.css('article'):
      #Strasse
      strasse = adress.xpath('//span[@itemprop="streetAddress"]//text()').extract_first()

      #Name
      name = adress.xpath('//span[@itemprop="name"]//text()').extract_first()

      #PLZ
      plz = adress.xpath('//span[@itemprop="postalCode"]//text()').extract_first()

      #Stadt
      stadt = adress.xpath('//span[@itemprop="addressLocality"]//text()').extract_first()

      yield {
        'name': name,
        'strasse': strasse,
        'plz': plz,
        'stadt': stadt,
      }

结果我得到15套总是相同的地址，但我认为它应该是15个不同的地址。

我感谢任何帮助。

Answer 1

使用绝对xpath表达式：

adress.xpath('//span[@itemprop="streetAddress"]//text()')

while应该使用相对于address（注意表达式中的前导点）：

adress.xpath('.//span[@itemprop="streetAddress"]//text()')

Scrapy仅返回第一个结果

1 个答案: