Question

我想清除我的Scrapy回复。我正在建立一个简单的价格监控器，但是在获取干净价格时遇到了麻烦。

我得到以下回应：

['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']

理想情况下，我希望它是（飘浮的？）：

272.28

我正在使用易碎物品来存储值，例如：

def parse_item(self, response):
    item = HobbyItem()
    item['new_price'] = response.css('span.price.new-price').extract()
    item['base_price'] = response.css('span.price.base-price').extract()

感谢您的帮助！

Answer 1

因为文本似乎在列表中，所以您首先需要将文本从列表中删除，然后将其剥离

>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> text = response[0]
'\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t'
>>> clean_text = text.strip()
'272.28€'
>>> number_text = clean_text.replace("€", "")
'272.28'
>>> number = float(number_text)
272.28

或者作为单线：

>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> float(response[0].strip().replace("€", ""))
272.28

Answer 2

使用此：

def parse_item(self, response):
   item = HobbyItem()
   item['new_price'] = response.css('span.price.new-price::text').get().replace('€', '').strip()
   item['base_price'] = response.css('span.price.base-price::text').get().replace('€', '').strip()

此处的get（）方法检索与CSS匹配的第一个元素，而strip方法去除多余的字符。您可以在here

中了解更多

Answer 3

在所有帮助之后，这才是对我有用的解决方案（远非最有效）

def parse_item(self, response):
    item = HobbyItem()
    if response.css('span.price.new-price::text').extract():
        new_price = response.css('span.price.new-price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['new_price'] = final_new_price
    else:
        item['new_price'] = '0'
    if response.css('span.base-price::text').extract():
        new_price = response.css('span.base-price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['base_price'] = final_new_price
    else:
        item['base_price'] = '0'
    if response.css('span.price::text').extract():
        new_price = response.css('span.price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['price'] = final_new_price
    else:
        item['price'] = '0'
    item['name'] = response.css('h1>span::text').extract()
    item['url'] = response.url
    yield item

从Scrapy响应的开头和结尾删除\ t

3 个答案: