我正在尝试使用scrapy抓取此网站,但会在 单个单元格,除了不同行中的每个值之外,我都是这样。
example:
milage: 25
milage: 377
milage: 247433
milage: 464130
但是我正在得到这样的数据
example:
milage:[u'25',
u'377',
u'247433',
u'399109',
u'464130',
u'399631',
u'435238',
u'285000',
u'287470',
u'280000']
这是我的代码
import scrapy
from ..items import ExampleItem
from scrapy.selector import HtmlXPathSelector
url = 'https://example.com'
class Example(scrapy.Spider):
name = 'example'
allowed_domains = ['www.example.com']
start_urls = [url]
def parse(self, response):
hxs = HtmlXPathSelector(response)
item_selector = hxs.select('//div[@class="listing_format card5 relative"]')
for fields in item_selector:
item = ExampleItem()
item ['Mileage'] = fields.select('//li[strong="Mileage"]/span/text()').extract()
yield item
答案 0 :(得分:1)
您没有显示您的网站,但是可能您需要相对的XPath:
item ['Mileage'] = fields.select('.//li[strong="Mileage"]/span/text()').extract_first()
答案 1 :(得分:0)
听起来你需要遍历里程。
for fields in item_selector:
milages = fields.select('//li[strong="Mileage"]/span/text()').extract()
for milage in milages:
item = CommercialtrucktraderItem()
item ['Mileage'] = milage
yield item
还考虑让您的fields.select('//li[strong="Mileage"]/span/text()').extract()
更具体吗?