运行一些简单的代码,看看哪些编码可以解码特定文件,如:
encodings = ('cp737', 'cp869', 'cp875', 'cp1253', 'iso2022_jp_2', 'iso8859_7',
'mac_greek', 'utf-8')
def test_encoding():
with tempfile.TemporaryDirectory() as tmp_dir:
for c in csvs:
for encoding in encodings:
try:
with open(c, 'r', encoding=encoding) as f:
content = f.read()
except UnicodeDecodeError as e:
print(encoding, e) # <---- print from here
continue
csv_out = os.path.join(tmp_dir, os.path.basename(
c[:-4]) + '_%s.csv' % encoding)
with open(csv_out, 'w', encoding=encoding,
newline='\n') as f:
f.write(content)
input('Files created in %s' % tmp_dir)
打印:
cp869 'charmap' codec can't decode byte 0x83 in position 28: character maps to <undefined>
cp1253 'charmap' codec can't decode byte 0x8c in position 26: character maps to <undefined>
iso2022_jp_2 'iso2022_jp_2' codec can't decode byte 0xce in position 18: illegal multibyte sequence
iso8859_7 'charmap' codec can't decode byte 0xae in position 84: character maps to <undefined>
那么'charmap' codec
是什么?为什么有时会打印'charmap' codec can't...
,而在iso2022_jp_2
中会打印出编码的名称?
我在
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)]