我试图用_替换nonunicode字符但是这个程序尽管编译时没有错误,但是没有解决问题,我无法确定原因。
import csv
import unicodedata
import pandas as pd
df = pd.read_csv('/Users/pabbott/Desktop/Unicode.csv', sep = ',',
index_col=False, converters={'ClinetEMail':str, 'ClientZip':str,
'LocationZip':str, 'LicenseeName': str, 'LocationState':str,
'AppointmentType':str, 'ClientCity':str, 'ClientState':str})
data = df
for row in data:
for val in row:
try:
val.encode("utf-8")
except UnicodeDecodeError:
replace(val,"_")
data.to_csv('UnicodeExport.csv', sep=',', index=False,
quoting=csv.QUOTE_NONNUMERIC)
答案 0 :(得分:0)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 4: invalid start byte
上述消息(从pd.read_csv
引发)表明该文件未保存在utf-8
中。你需要
utf-8
,例如(后一种变体),将encoding='windows-1252'
添加到df = pd.read_csv(…
,如下所示:
df = pd.read_csv('/Users/pabbott/Desktop/Unicode.csv', sep = ',', encoding='windows-1252',
index_col=False, converters={'ClinetEMail':str, 'ClientZip':str,
'LocationZip':str, 'LicenseeName': str, 'LocationState':str,
'AppointmentType':str, 'ClientCity':str, 'ClientState':str})
然后,你可以在try: val.encode("utf-8")
循环中省略所有的东西 。for row in data: for val in row:
encoding
:str
,默认None
在读/写时用于UTF的编码(例如
'utf-8'
)。 List of Python standard encodings