Question

我使用Python 2.7从MySQL表中读取数据。在MySQL中，名称如下所示：

Garasa，Ángel。

但是当我用Python打印它时，输出是

Garasa， ngel

MySQL中的字符集名称是utf8。这是我的Python代码：

# coding: utf-8

import MySQLdb

connection = MySQLdb.connect     
(host="localhost",user="root",passwd="root",db="jmdb")
cursor = connection.cursor ()
cursor.execute ("select * from actors where actorid=672462;")
data = cursor.fetchall ()
for row in data:
    print  "IMDB Name=",row[4]
    wiki=("".join(row[4]))
    print wiki

我尝试过解码，但收到的错误如下：

UnicodeDecodeError：'utf8'编解码器无法解码位置8中的字节0xc1：起始字节无效

我已阅读有关解码和UTF-8但未找到解决方案。

Answer 1

我认为你的案例中正确的字符映射是cp1252：

>>> s = 'Garasa, Ángel.'
>>> s.decode('utf-8')

Traceback (most recent call last):
  File "<pyshell#63>", line 1, in <module>
    s.decode('utf-8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc1 in position 8: invalid start byte

>>> s.decode('cp1252')
u'Garasa, \xc1ngel.'
>>>
>>> print s.decode('cp1252')
Garasa, Ángel.

编辑：它也可能是latin-1：

>>> s.decode('latin-1')
u'Garasa, \xc1ngel.'
>>> print s.decode('latin-1')
Garasa, Ángel.

由于cp1252和latin-1代码页与除128到159之外的所有代码相交。

引自this来源（latin-1）：

对于所有代码，Windows-1252代码页与ISO-8859-1一致除了范围128到159（十六进制80到9F），其中使用很少的C1 控件被替换为包括所有的其他字符缺少字符由ISO-8859-15
提供

this一个（cp1252）：

此字符编码是ISO 8859-1的超集，但不同于 IANA的ISO-8859-1使用可显示的字符而不是控制80到9F（十六进制）范围内的字符。

Answer 2

获取Mysql驱动程序以返回Unicode字符串。这意味着您不必在代码中处理解码。

只需在连接参数中设置use_unicode=True即可。如果已使用特定编码设置表，则相应地设置charset属性。

Python输出用替换非ASCII字符

2 个答案:

Python输出用 替换非ASCII字符

2 个答案:

Python输出用替换非ASCII字符