Question

与熊猫一起工作，每次使用“特殊”字符，例如：é 出现它会引发编码错误。这是我的代码：

import pandas as pd


df = pd.read_csv(r'movies analysis\movie_metadata.csv',encoding='utf-8')
print(df.loc[df['title_year'] == 2015])

这是我得到的错误：

 File "D:\ANACONDA\lib\encodings\cp1255.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 645: character maps to <undefined>

关于这个主题的话题很多，但是没有一个提供适合我的解决方案。

Answer 1

我最终要做的是将所有字符“标准化”为ascii。这不是理想的方法，但对我有用：

String input = in.nextLine();
// Verify the entire line only contains the integer:
if(input.matches("\\d+")){
  l = Integer.parseInt(input);
} else{
  // TODO: Validation message: not a valid integer
}

因此，对于像import unicodedata unicodedata.normalize('NFD', Data_U_Want_To_Normalize).encode('ascii', 'ignore')这样的字母，它将变成é。

Answer 2

将 export LANG=zh_CN.utf8 或 export LANG=us_EN 添加到您的 bashrc 或 shell 脚本中，它对我有用我的脚本是：

import pandas as pd
pd.read_excel(f) 
# there are chinese in my file

熊猫UnicodeEncodeError：'charmap'编解码器无法编码字符

2 个答案: