Question

<a href="../legislation/Legislation.aspx?id=62397"><span style="cursor:pointer;" title="weight = 2">Expiry Date</span> + 5 years</a>

如何在一个行代码中将数据提取为Expiry data+ 5 years？

response.xpath('//tr[@style="cursor:pointer;"]/td[1]/a/span/text() | //tr[@style="cursor:pointer;"]/td[1]/a/text()').extract()

返回两个元素Expiry code和+5 days

我在一张桌子上工作，这意味着有很多像这样的牧羊人，每一个我想要连接信息

[u'Expiry Date'，u'+ 5年'，u'Due Date'，u'+ 4年'，u'Creation'，u'+ 3年']但我希望[有效期+ 5年截止日期+ 4年，创作+ 3年] 非常感谢

Answer 1

您可以加入a：

内的所有文字节点

"".join(response.xpath("//a[contains(@href, 'Legislation')]//text()").extract())

演示：

$ scrapy shell index.html
In [1]: "".join(response.xpath("//a[contains(@href, 'Legislation')]//text()").extract())
Out[1]: u'Expiry Date + 5 years'

Answer 2

最后，得到一个解决方案，即使不优雅......

    retentionEvent=[]
    retentionPeriod = leghxs.xpath('//a[contains(@href, "Legislation")]')
    for each in retentionPeriod:
           event=each.xpath( "span/text()").extract()

           period = each.xpath("text()").extract()
           retentionEvent.append( event+period)`

然后在这里你得到一个列表列表，你想在使用scrapy抓取数据时将每个列表（例如：到期日期+ 5年）分配给项目[key]

for eachretention in retentionEvent:
     item = RetentionElement()
     item['time']=eachretention

如何在xpath1.0中将此信息提取为一个节点？

2 个答案: