Question

又是一个主题^^基于这里的建议，我实现了我的机器人以下内容，并在shell中对其进行了测试：

    name_list = response.css("h2.label.title::text").extract()
    packaging_list = response.css("div.label.packaging::text").extract()
    ean = response.css("h1.page-title::text").extract_first()
    product_price = ''.join(response.css('.product-pricing__main-price  ::text').extract())
    company = "carrefour"

    for name, packaging, price in zip(name_list, packaging_list, product_price):
        item = ScrapybotItem()
        item['ean'] = ean
        item['desc'] = name.replace("\n","").strip() + " " +  packaging
        item['price'] = price
        item['company'] = company

        yield item

问题出在价格字段上。

对于外壳价格，例如：

In [2]: product_price
Out[2]: '\n                    5,65€\n\n  \n      '

同一产品的脚本输出：

{'company': 'carrefour',
'desc': "Gel nettoyant anti-imperfections 5 en 1 L'Oréal Paris Men Expert 
 le "
     'tube de 150ml',
 'ean': '\n  1 résultat pour « 3600522418634 »\n',
 'price': '\n'}

您知道我为什么不使用脚本获得价格结果吗？

Answer 1

product_price是一个字符串，因为您要在以下位置连接选择器的结果：

product_price = ''.join(response.css('.product-pricing__main-price  ::text').extract())

然后，当您使用zip时，您将把该字符串分割成几部分，因此第一项将带有\n，因为它可能是{{1 }}。

检查此示例：

product_price

输出：

>>> for i, j, k in zip([1, 2, 3, 4], [5, 6, 7, 8], 'abcd'):
        print (i, j, k)

Scrapy在shell中但不在脚本中获取结果

1 个答案: