UnicodeEncodeError:' ascii'编解码器不能对字符u' \ u201c'进行编码。使用utf-16将系列对象转换为pandas中的unicode时

时间:2014-04-16 18:18:19

标签: python python-2.7 unicode pandas

我有一个utf-16 csv文件,我正在尝试加载到Pandas中。默认情况下,数据以对象数据类型的形式出现。我打算用标题列做一些建模,所以我想将列df ['caption']从一个对象转换为一个unicode字符串。目前我遇到了以下错误'UnicodeEncodeError:'ascii'编解码器无法对位置6中的字符u'\ u201c'进行编码:ordinal不在范围内(128)' DF [ '字幕'] = DF [ '字幕']。astype(Unicode)的

我尝试通过对df ['caption']列中的各个值使用编码和解码函数来解决这个问题,但我无法使其工作。






UnicodeEncodeError: Traceback (most recent call last)
<ipython-input-5-aad36f4acf38> in <module>()
    10 print df['caption'].head(10)
---> 12 df['caption']=df['caption'].astype(unicode)

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/generic.pyc in astype(self, dtype, copy, raise_on_error)
   2017         mgr = self._data.astype(
-> 2018             dtype, copy=copy, raise_on_error=raise_on_error)
   2019         return self._constructor(mgr).__finalize__(self)

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in astype(self, *args, **kwargs)
   2415     def astype(self, *args, **kwargs):
-> 2416         return self.apply('astype', *args, **kwargs)
   2418     def convert(self, *args, **kwargs):

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   2374             else:
-> 2375                 applied = getattr(blk, f)(*args, **kwargs)
   2377             if isinstance(applied, list):

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in astype(self, dtype, copy, raise_on_error, values)
    425     def astype(self, dtype, copy=False, raise_on_error=True, values=None):
    426         return self._astype(dtype, copy=copy, raise_on_error=raise_on_error,
--> 427                             values=values)
    429     def _astype(self, dtype, copy=False, raise_on_error=True, values=None,

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/internals.pyc in _astype(self, dtype, copy, raise_on_error, values, klass)
    442             # force the copy here
    443             if values is None:
--> 444                 values = com._astype_nansafe(self.values, dtype, copy=True)
    445             newb = make_block(values, self.items, self.ref_items,
    446                               ndim=self.ndim, placement=self._ref_locs,

/opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/common.pyc in _astype_nansafe(arr, dtype, copy)
  2222         return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  2223     elif issubclass(dtype.type, compat.string_types):
   -> 2224         return lib.astype_str(arr.ravel()).reshape(arr.shape)
  2226     if copy:

   /opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.astype_str (pandas/lib.c:12944)()

   /opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.astype_str (pandas/lib.c:12862)()

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 6: ordinal not in range(128)


import pandas as pd
import numpy as np

df = pd.read_csv('Chevrolet_4-7-2014_cvid_data.csv',encoding='utf-16',header=0,na_values=['N/A',''],names=['channel','link','title','posted','views','likes','dislikes','description','category','statdate','statviews','timewatched','averagetw','subsdriven','shares','caption'])
print df.head(5)
print df.dtypes

print df['caption'].head(10)



channel                                        link  \
0  Chevrolet  http://www.youtube.com/watch?v=dCayKZe6WvI   
1  Chevrolet  http://www.youtube.com/watch?v=IRXK35dPXbE   
2  Chevrolet  http://www.youtube.com/watch?v=XXdj4QMw748   
3  Chevrolet  http://www.youtube.com/watch?v=_ger32ROs94   
4  Chevrolet  http://www.youtube.com/watch?v=Chfm7Pou49k   
5  Chevrolet  http://www.youtube.com/watch?v=ySmEJyQ94BI   

                                           title       posted   views  \
0  Chevy Open House Event: From Our House to Your...  Apr  1 2014   73111   
1  Truck Towing Capabilities: 2014 Silverado -- #...  Mar 26 2014   11934   
2  Potholes at the Milford Proving Grounds: Tips ...  Mar 20 2014    8037   
3  Diesel Trucks: Heavy Duty Strengths -- 2015 Si...  Mar 20 2014   12096   
4  Captain America: All in a Day's Work -- 2014 T...  Mar 14 2014   93377   
5  Media Blasting: Camaro Engineering -- 2014 Cam...  Mar 13 2014  109931   

   likes  dislikes                                        description  \
0     43        13  In March over 100000 people visited our Chevy ...   
1    183        56  Farmer Dewayne Kleman and General Motors engin...   
2     58        10  Chevrolet vehicles are carefully designed to w...   
3    210         6  Introducing the all-new 2015 Silverado HD. The...   
4   1095        35  From saving the world to working on math homew...   

       category statdate  statviews timewatched averagetw  subsdriven  \
0  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
1  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
2  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
3  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   
4  Autos & Vehicles      NaN        NaN         NaN       NaN         NaN   

   shares                                            caption  
0     NaN   The Chevy Spring Open House Sale the perfect ...  
1     NaN   0:03 A Man And His Truck And An Engineer / To...  
2     NaN   0:02 Severe Bump road sign 0:07 Pothole Facil...  
3     NaN   0:03 And there's no stronger Silverado than t...  
4     NaN   0:03 Are you doing anything fun Saturday nigh...  
5     NaN   0:05 Camaro Z/28 logo 0:07 Z/28 Bead Lock 0:0...  

[5 rows x 16 columns]
channel         object
link            object
title           object
posted          object
views           object
likes            int64
dislikes         int64
description     object
category        object
statdate        object
statviews      float64
timewatched     object
averagetw       object
subsdriven     float64
shares         float64
caption         object

dtype: object
0     The Chevy Spring Open House Sale the perfect ...
1     0:03 A Man And His Truck And An Engineer / To...
2     0:02 Severe Bump road sign 0:07 Pothole Facil...
3     0:03 And there's no stronger Silverado than t...
4     0:03 Are you doing anything fun Saturday nigh...
5     0:05 Camaro Z/28 logo 0:07 Z/28 Bead Lock 0:0...

Name: caption, dtype: object

您可以尝试在dtype={'caption' : str}电话中添加read_csv()吗?像:

df = pd.read_csv('Chevrolet_4-7-2014_cvid_data.csv',
     dtype={'caption' : str})

BTW,pandas默认使用header=0。并非我可以看到您的CSV,但使用names关键字参数可能会多余,因为如果它们位于CSV的第0行,它们会自动使用这些列名。但无论如何,让我知道另一件事是否适合你。 :)