Question

我最近开始学习Scrapy（以及Python），但遇到了一个奇怪的问题，到目前为止我还没有找到解释。我设法找到了一种解决方法（见下文），但很想知道.extract（）行为背后的原因。

在我的解析功能中运行以下内容

item['stops'] = response.xpath('//td[@class="station"]/a[@href]/text()').extract

导致Scrapy保存的不是定义的输出csv中的数据，而是完整的字符串（？），如下所示：

<bound method SelectorList.extract of 
[<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'K\xf6ln Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Siegburg/Bonn'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Frankfurt(M) Flughafen Fernbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Mannheim Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Karlsruhe Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Offenburg'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Freiburg(Breisgau) Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Basel Bad Bf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Basel SBB'>]>

数据已正确分配，但不会传递给元素。使用.re（）而不是.extract（）运行的其他函数可以正常工作。令人惊讶的是，如果我按如下方式运行上述查询也可以正常工作

item['stops'] = response.xpath('//td[@class="station"]/a[@href]/text()').re('.*')

Answer 1

希望有所帮助

sel = Selector(response)
item['stops'] = sel.xpath('//td[@class="station"]/a/@href").extract()[0]

Scrapy：Selector使用.extract返回完整元素（但正确分配数据）

1 个答案: