Question

我正在尝试对数据库中的某些数据进行CSV，以便将其移动到数据仓库中的云中。但是，当我运行它时，它总是在36,599行后退出并给我

UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 62: character maps to <undefined>

我发现导致问题的字符串是“拒绝的案例号不匹配”，并且我认为这是撇号存在的问题。我不知道为什么它导致了此问题，并且无法找到解决方法。有谁知道如何解决这个问题？我使用的代码是：

db = pymysql.connect(host='host', port=3306, user="user", passwd="secret", 
db="db", autocommit=True)
cur = db.cursor()
#cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select fields from table""") 


emails = cur.fetchall()
with open('O:\file\path\to\File_name.csv','w') as fileout:
        writer = csv.writer(fileout)
        writer.writerows(emails)   
time.sleep(1)

Answer 1

由于您没有显示导致错误的代码，我只是在猜测。

唯一的事实是此字符串'Rejected-Case No. doesn’t match'包含一个"’"，它是Unicode字符U + 2019，右单引号。在Windows cp1252代码页中，此字符确实具有代码0x92。

您似乎有 somewhere 个用cp1252字符集编码的字节字符串，但未正确解码为unicode字符串。

应该做什么：

有解决方案。不幸的是，它们将取决于您使用的Python版本（2或3），并且在不了解任何代码的情况下，我只能给出一般性建议：

识别输入字符集（数据库提供给Python脚本的字符集）
识别输出字符集（您要在CSV模块中写的内容）
使用显式转换以传递正确的字符集
可选地在编码/解码调用中使用error=replace以避免UnicodeError异常。

如果您使用Python3，我将假设您在从数据库解码unicode时遇到问题。右单引号的编码为U + 2019，但在给Python的字符串中编码为'\x92'，即cp1252字节编码。一个快速且肮脏的解决方法是强制编码/解码过程获得正确的unicode字符串。您的代码可能变为：

db = pymysql.connect(host='host', port=3306, user="user", passwd="secret", 
db="db", autocommit=True)
cur = db.cursor()
#cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select fields from table""") 

charset = 'cp1252'   # or 'utf8' depending on what you want in the csv file
with open('O:\file\path\to\File_name.csv','w', encoding=charset,
           errors='replace', newline='') as fileout:
        writer = csv.writer(fileout)
        for row in cur.fetchall():
            writer.writerow([field.encode('latin1').decode('cp1252', errors='replace')
                for field in row])

encode('latin1').decode('cp1252')只是一种修复Python3字符串的技巧，其中字符具有字节编码的代码。之所以起作用，是因为latin1编码对于256以下的所有代码都是无操作的。

errors=replace选项，要求Python从不引发UnicodeError异常，而是用'?'替换字节字符串或官方Unicode REPLACEMENT CHARACTER U + FFFD {{1 }}表示unicode字符串。

使用'�'的{{1}}选项可能更干净。不幸的是，我从未使用过Python的MySQL数据库...

如何将字符映射表修复为未定义的unicode错误

1 个答案: