UnicodeEncodeError:' charmap'编解码器无法对字符进行编码 - Python和Spyder

时间:2018-04-16 05:31:56

标签: python utf-8 char spyder

我正在尝试关注NLP示例: https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e

可在此处找到数据集和jupyternotebook文件:https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb

为Python 2提供了所提供的代码。但是,我尝试适应Python 3并遇到UnicodeEncodeError。

**我尝试使用Python 2,它按预期工作。

注意:在本教程中,我安装了TensorFlow(CPU),Keras以及NLTK软件包。我还拿出了#34;编解码器"因为在Python3中不需要它

我只会向失败的部分展示。我没有使用JupyterNotebook而是使用Spyder。

# -*- coding: utf-8 -*-
import keras
import nltk
import pandas as pd
import numpy as np
import re
#import codecs 

input_file = open("socialmedia_relevant_cols.csv", 'r', encoding='utf-8', errors='replace')
output_file = open("socialmedia_relevant_cols_clean.csv", "w")

def sanitize_characters(raw, clean):    
     for line in input_file:
     out = line
     output_file.write(line)

sanitize_characters(input_file, output_file)

这是错误

  Traceback (most recent call last):

  File "<ipython-input-6-1c76dfb0b166>", line 1, in <module>
    runfile('C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master/NLP_tutorial.py', wdir='C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master')

  File "C:\ProgramData\Anaconda3\envs\TensorFlow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\envs\TensorFlow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master/NLP_tutorial.py", line 26, in <module>
    sanitize_characters(input_file, output_file)

  File "C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master/NLP_tutorial.py", line 24, in sanitize_characters
    output_file.write(line)

  File "C:\ProgramData\Anaconda3\envs\TensorFlow\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>

我使用的是Windows 10,Anaconda 4.5.0,Anaconda Navigator 1.8.2,Spyder 3.2.8和Jupyter Notebook 5.4.0

在编码时我不是一个完全的初学者,但这是我第一次学习Python进行数据分析,所以我希望它在Python 2和3之间不是非常明显。

我在我的窗口cmd上尝试了chcp 65001,但它似乎没有帮助。

提前非常感谢你。

0 个答案:

没有答案