我需要处理一些包含大量“ - ”('\ u2212')的Excel文件,以及其他字符。经过大量尝试后,我甚至无法在屏幕上打印,或将其保存到文件中:
a='−'
print(a.encode('utf-8')) # print b'\xe2\x88\x92'
print(a) # raise UnicodeEncodeError: 'gbk' codec can't encode character '\u2212' in position 0: illegal multibyte sequence
with open('test.txt','w') as file:
file.write(a) # raise UnicodeEncodeError: 'gbk' codec can't encode character '\u2212' in position 0: illegal multibyte sequence
在此页面中:https://docs.python.org/3.4/howto/unicode.html,它将其替换为其他一些字符,但我必须将其打印出来,或至少将其正确写入文件:
>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
...
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'ꀀabcd޴'
>>> u.encode('ascii', 'backslashreplace')
b'\\ua000abcd\\u07b4'
我该怎么做?