什么是' charmap'编解码器显示在python的UnicodeErrors中?

时间:2018-02-20 09:41:11

标签: python python-3.x character-encoding decode encode

运行一些简单的代码,看看哪些编码可以解码特定文件,如:

encodings = ('cp737', 'cp869', 'cp875', 'cp1253', 'iso2022_jp_2', 'iso8859_7',
             'mac_greek', 'utf-8')

def test_encoding():
    with tempfile.TemporaryDirectory() as tmp_dir:
        for c in csvs:
            for encoding in encodings:
                try:
                    with open(c, 'r', encoding=encoding) as f:
                        content = f.read()
                except UnicodeDecodeError as e:
                    print(encoding, e) # <---- print from here
                    continue
                csv_out = os.path.join(tmp_dir, os.path.basename(
                    c[:-4]) + '_%s.csv' % encoding)
                with open(csv_out, 'w', encoding=encoding,
                          newline='\n') as f:
                    f.write(content)
        input('Files created in %s' % tmp_dir)

打印:

cp869 'charmap' codec can't decode byte 0x83 in position 28: character maps to <undefined>
cp1253 'charmap' codec can't decode byte 0x8c in position 26: character maps to <undefined>
iso2022_jp_2 'iso2022_jp_2' codec can't decode byte 0xce in position 18: illegal multibyte sequence
iso8859_7 'charmap' codec can't decode byte 0xae in position 84: character maps to <undefined>

那么'charmap' codec是什么?为什么有时会打印'charmap' codec can't...,而在iso2022_jp_2中会打印出编码的名称?

我在

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)]

0 个答案:

没有答案