编解码器无法编码字符python3

时间:2019-11-05 20:58:45

标签: python python-3.x web-scraping beautifulsoup

我想从这个网站上抓取名称和价格:

https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniqBStoreParam1=val1&wid=11.productCard.PMU_V2

名称和价格都在div标签内。

名称:

enter image description here

价格

enter image description here

打印名称效果很好,但是打印价格给我一个错误:

Traceback (most recent call last):
  File "c:\File.py", line 37, in <module>
    print(price.text)
  File "C:\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 0: character maps to <undefined>

代码:

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import requests

response = requests.get("https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniq")
soup = BeautifulSoup(response.text, 'html.parser')
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
    name=a.find('div', attrs={'class':'_3wU53n'})
    price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
    print(name.text)

enter image description here

两者之间有什么区别?

那么为什么其中一个给我一个错误而另一个没有给我一个错误?

1 个答案:

答案 0 :(得分:1)

产生此错误是因为python在使用该货币符号时遇到了麻烦。印度卢比符号的解释depending on the language不同,默认情况下不在python charmap中。如果我们将您的上一个打印语句更改为<div class="flex-container-row"> <div class="flex-item-column"> <h2>Title: All titles should be the same height</h2> <p>this is some body text.</p> </div> <div class="flex-item-column"> <h2>This is a shorter title</h2> <p>this is somebody elses text.</p> </div> <div class="flex-item-column"> <h2>This is an even more longerer title, perhaps even the longerist of all the titles</h2> <p>this is somebody's text.</p> </div> </div>,我们将获得如下结果:

print(str(price.text.encode("utf-8")))

由于此输出不是很漂亮并且可能无法使用,因此我会在打印之前亲自截断该符号。如果您确实希望python打印印度卢比符号,则可以将其添加到您的charmap中。从this post开始执行以下步骤,将自定义项添加到charmap。