python中utf-8的问题

时间:2016-12-11 08:58:05

标签: python pandas

我的代码如下。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
import codecs

df1 = pd.read_csv(r'E:\내논문자료\wordcloud\test1\1311_1312.csv',encoding='utf-8')

df2 = df1.groupby(['address']).size().reset_index()
df2.rename(columns = {0: 'frequency'}, inplace = True)
print(df2[:100])

但是当我执行这段代码时,我收到了这条消息

Traceback (most recent call last):
File "E:/빅데이터 캠퍼스/untitled1/groupby freq.py", line 7, in <module>
df1 = pd.read_csv(r'E:\내논문자료\wordcloud\test1\1311_1312.csv',encoding='utf-8')
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 645, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 400, in _read
data = parser.read()
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 938, in read
ret = self._engine.read(nrows)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1507, in read
 data = self._reader.read(nrows)
File "pandas\parser.pyx", line 846, in pandas.parser.TextReader.read (pandas\parser.c:10364)
File "pandas\parser.pyx", line 868, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10640)
File "pandas\parser.pyx", line 945, in pandas.parser.TextReader._read_rows (pandas\parser.c:11677)
File "pandas\parser.pyx", line 1047, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:13111)
File "pandas\parser.pyx", line 1106, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:14065)
File "pandas\parser.pyx", line 1204, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:16121)
File "pandas\parser.pyx", line 1220, in pandas.parser.TextReader._string_convert (pandas\parser.c:16349)
File "pandas\parser.pyx", line 1452, in pandas.parser._string_box_utf8 (pandas\parser.c:22014)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte

我该如何解决? 我应该改变pandas中的解析器代码吗?

1 个答案:

答案 0 :(得分:1)

看起来你的源数据还没有用UTF-8编码 - 它可能是其他编解码器之一。根据{{​​3}},您可能希望尝试使用advanced开始,或encoding='GBK'