从Scrapy响应的开头和结尾删除\ t

时间:2019-03-07 12:09:28

标签: python scrapy

我想清除我的Scrapy回复。我正在建立一个简单的价格监控器,但是在获取干净价格时遇到了麻烦。

我得到以下回应:

['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']

理想情况下,我希望它是(飘浮的?):

272.28

我正在使用易碎物品来存储值,例如:

def parse_item(self, response):
    item = HobbyItem()
    item['new_price'] = response.css('span.price.new-price').extract()
    item['base_price'] = response.css('span.price.base-price').extract()

感谢您的帮助!

3 个答案:

答案 0 :(得分:1)

因为文本似乎在列表中,所以您首先需要将文本从列表中删除,然后将其剥离

>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> text = response[0]
'\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t'
>>> clean_text = text.strip()
'272.28€'
>>> number_text = clean_text.replace("€", "")
'272.28'
>>> number = float(number_text)
272.28

或者作为单线:

>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> float(response[0].strip().replace("€", ""))
272.28

答案 1 :(得分:0)

使用此:

def parse_item(self, response):
   item = HobbyItem()
   item['new_price'] = response.css('span.price.new-price::text').get().replace('€', '').strip()
   item['base_price'] = response.css('span.price.base-price::text').get().replace('€', '').strip()

此处的get()方法检索与CSS匹配的第一个元素,而strip方法去除多余的字符。您可以在here

中了解更多

答案 2 :(得分:0)

在所有帮助之后,这才是对我有用的解决方案(远非最有效)

def parse_item(self, response):
    item = HobbyItem()
    if response.css('span.price.new-price::text').extract():
        new_price = response.css('span.price.new-price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['new_price'] = final_new_price
    else:
        item['new_price'] = '0'
    if response.css('span.base-price::text').extract():
        new_price = response.css('span.base-price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['base_price'] = final_new_price
    else:
        item['base_price'] = '0'
    if response.css('span.price::text').extract():
        new_price = response.css('span.price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['price'] = final_new_price
    else:
        item['price'] = '0'
    item['name'] = response.css('h1>span::text').extract()
    item['url'] = response.url
    yield item