我想清除我的Scrapy回复。我正在建立一个简单的价格监控器,但是在获取干净价格时遇到了麻烦。
我得到以下回应:
['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
理想情况下,我希望它是(飘浮的?):
272.28
我正在使用易碎物品来存储值,例如:
def parse_item(self, response):
item = HobbyItem()
item['new_price'] = response.css('span.price.new-price').extract()
item['base_price'] = response.css('span.price.base-price').extract()
感谢您的帮助!
答案 0 :(得分:1)
因为文本似乎在列表中,所以您首先需要将文本从列表中删除,然后将其剥离
>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> text = response[0]
'\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t'
>>> clean_text = text.strip()
'272.28€'
>>> number_text = clean_text.replace("€", "")
'272.28'
>>> number = float(number_text)
272.28
或者作为单线:
>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> float(response[0].strip().replace("€", ""))
272.28
答案 1 :(得分:0)
使用此:
def parse_item(self, response):
item = HobbyItem()
item['new_price'] = response.css('span.price.new-price::text').get().replace('€', '').strip()
item['base_price'] = response.css('span.price.base-price::text').get().replace('€', '').strip()
此处的get()方法检索与CSS匹配的第一个元素,而strip方法去除多余的字符。您可以在here
中了解更多答案 2 :(得分:0)
在所有帮助之后,这才是对我有用的解决方案(远非最有效)
def parse_item(self, response):
item = HobbyItem()
if response.css('span.price.new-price::text').extract():
new_price = response.css('span.price.new-price::text').extract()
new_price_clean = new_price[0]
new_price_clean_strip = new_price_clean.strip()
new_price_clean_euro = new_price_clean_strip.replace("€", "")
final_new_price = float(new_price_clean_euro)
item['new_price'] = final_new_price
else:
item['new_price'] = '0'
if response.css('span.base-price::text').extract():
new_price = response.css('span.base-price::text').extract()
new_price_clean = new_price[0]
new_price_clean_strip = new_price_clean.strip()
new_price_clean_euro = new_price_clean_strip.replace("€", "")
final_new_price = float(new_price_clean_euro)
item['base_price'] = final_new_price
else:
item['base_price'] = '0'
if response.css('span.price::text').extract():
new_price = response.css('span.price::text').extract()
new_price_clean = new_price[0]
new_price_clean_strip = new_price_clean.strip()
new_price_clean_euro = new_price_clean_strip.replace("€", "")
final_new_price = float(new_price_clean_euro)
item['price'] = final_new_price
else:
item['price'] = '0'
item['name'] = response.css('h1>span::text').extract()
item['url'] = response.url
yield item