Scrapy在单个单元格中返回所有值

时间:2018-09-03 17:05:44

标签: python scrapy

我正在尝试使用scrapy抓取此网站,但会在  单个单元格,除了不同行中的每个值之外,我都是这样。

example:
milage: 25
milage: 377
milage: 247433
milage: 464130

但是我正在得到这样的数据

example:
milage:[u'25',
 u'377',
 u'247433',
 u'399109',
 u'464130',
 u'399631',
 u'435238',
 u'285000',
 u'287470',
 u'280000']

这是我的代码

import scrapy
from ..items import ExampleItem
from scrapy.selector import HtmlXPathSelector
url = 'https://example.com'
class Example(scrapy.Spider):
    name = 'example'
    allowed_domains = ['www.example.com']
    start_urls = [url]
    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        item_selector = hxs.select('//div[@class="listing_format card5 relative"]')
        for fields in item_selector:
            item = ExampleItem()
            item ['Mileage'] = fields.select('//li[strong="Mileage"]/span/text()').extract()
            yield item

2 个答案:

答案 0 :(得分:1)

您没有显示您的网站,但是可能您需要相对的XPath:

item ['Mileage'] = fields.select('.//li[strong="Mileage"]/span/text()').extract_first()

答案 1 :(得分:0)

听起来你需要遍历里程。

for fields in item_selector:
    milages = fields.select('//li[strong="Mileage"]/span/text()').extract()
    for milage in milages:
        item = CommercialtrucktraderItem()
        item ['Mileage'] = milage  
        yield item

还考虑让您的fields.select('//li[strong="Mileage"]/span/text()').extract()更具体吗?