我正在尝试关注NLP示例: https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e
可在此处找到数据集和jupyternotebook文件:https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb
为Python 2提供了所提供的代码。但是,我尝试适应Python 3并遇到UnicodeEncodeError。
**我尝试使用Python 2,它按预期工作。
注意:在本教程中,我安装了TensorFlow(CPU),Keras以及NLTK软件包。我还拿出了#34;编解码器"因为在Python3中不需要它
我只会向失败的部分展示。我没有使用JupyterNotebook而是使用Spyder。
# -*- coding: utf-8 -*-
import keras
import nltk
import pandas as pd
import numpy as np
import re
#import codecs
input_file = open("socialmedia_relevant_cols.csv", 'r', encoding='utf-8', errors='replace')
output_file = open("socialmedia_relevant_cols_clean.csv", "w")
def sanitize_characters(raw, clean):
for line in input_file:
out = line
output_file.write(line)
sanitize_characters(input_file, output_file)
这是错误
Traceback (most recent call last):
File "<ipython-input-6-1c76dfb0b166>", line 1, in <module>
runfile('C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master/NLP_tutorial.py', wdir='C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master')
File "C:\ProgramData\Anaconda3\envs\TensorFlow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\envs\TensorFlow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master/NLP_tutorial.py", line 26, in <module>
sanitize_characters(input_file, output_file)
File "C:/Users/Timothy Cumberland/Desktop/concrete_NLP_tutorial-master/concrete_NLP_tutorial-master/NLP_tutorial.py", line 24, in sanitize_characters
output_file.write(line)
File "C:\ProgramData\Anaconda3\envs\TensorFlow\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>
我使用的是Windows 10,Anaconda 4.5.0,Anaconda Navigator 1.8.2,Spyder 3.2.8和Jupyter Notebook 5.4.0
在编码时我不是一个完全的初学者,但这是我第一次学习Python进行数据分析,所以我希望它在Python 2和3之间不是非常明显。
我在我的窗口cmd上尝试了chcp 65001
,但它似乎没有帮助。
提前非常感谢你。