Python&熊猫:无法删除列

时间:2015-08-13 13:16:49

标签: python pandas

以下是我的数据的一部分:

USAF,   NCDC,  Date,     HrMn, I, Type,  QCP,  Dir, Q, I, Spd,   Q, 
034820,99999,19490801,0000,4,  SAO,    ,270,1,N,  5.7,1,
034820,99999,19490801,0100,4,  SAO,    ,270,1,N,  4.6,1,
034820,99999,19490801,0200,4,  SAO,    ,270,1,N,  4.6,1,
034820,99999,19490801,0300,4,  SAO,    ,270,1,N,  4.6,1,
034820,99999,19490801,0400,4,  SAO,    ,270,1,N,  6.7,1,
034820,99999,19490801,0500,4,  SAO,    ,270,1,N,  8.7,1,
034820,99999,19490801,0600,4,  SAO,    ,270,1,N,  8.2,1,
034820,99999,19490801,0700,4,  SAO,    ,270,1,N,  8.2,1,

我尝试删除数据,但报告某些列不存在。

ipath= "C:\Users\Administrator\Desktop\科研风速数据资料\Marham\\marham.txt"
uipath = unicode(ipath , "utf8")
file2 = open(uipath)
df = pd.read_csv(uipath,header=0)
# I can drop USAF, but unable to drop NCDC, I, Q, 
df.drop(['USAF', 'NCDC'], 1,inplace=True)
df.describe()

报告:

ValueError: labels ['NCDC'] not contained in axis

我的代码出了什么问题?我过去常常使用df.drop(['a', 'b', 'c'], 1,inplace=True)

更新:

df.columns.tolist()
['USAF',
 '   NCDC',
 '  Date',
 '     HrMn',
 ' I',
 ' Type',
 '  QCP',
 '  Dir',
 ' Q',
 ' I.1',
 ' Spd',
 '   Q',
 ' ']

1 个答案:

答案 0 :(得分:2)

read_csv的默认分隔符是逗号,但您的数据包含空格,这会在标题和数据中引入前导空格,如果您传递skipinitialspace=True,则导入正常:

In [106]:
import io
import pandas as pd
t="""USAF,   NCDC,  Date,     HrMn, I, Type,  QCP,  Dir, Q, I, Spd,   Q, 
034820,99999,19490801,0000,4,  SAO,    ,270,1,N,  5.7,1,
034820,99999,19490801,0100,4,  SAO,    ,270,1,N,  4.6,1,
034820,99999,19490801,0200,4,  SAO,    ,270,1,N,  4.6,1,
034820,99999,19490801,0300,4,  SAO,    ,270,1,N,  4.6,1,
034820,99999,19490801,0400,4,  SAO,    ,270,1,N,  6.7,1,
034820,99999,19490801,0500,4,  SAO,    ,270,1,N,  8.7,1,
034820,99999,19490801,0600,4,  SAO,    ,270,1,N,  8.2,1,
034820,99999,19490801,0700,4,  SAO,    ,270,1,N,  8.2,1,"""
df = pd.read_csv(io.StringIO(t), skipinitialspace=True)
df

Out[106]:
    USAF   NCDC      Date  HrMn  I Type  QCP  Dir  Q I.1  Spd  Q.1  \
0  34820  99999  19490801     0  4  SAO  NaN  270  1   N  5.7    1   
1  34820  99999  19490801   100  4  SAO  NaN  270  1   N  4.6    1   
2  34820  99999  19490801   200  4  SAO  NaN  270  1   N  4.6    1   
3  34820  99999  19490801   300  4  SAO  NaN  270  1   N  4.6    1   
4  34820  99999  19490801   400  4  SAO  NaN  270  1   N  6.7    1   
5  34820  99999  19490801   500  4  SAO  NaN  270  1   N  8.7    1   
6  34820  99999  19490801   600  4  SAO  NaN  270  1   N  8.2    1   
7  34820  99999  19490801   700  4  SAO  NaN  270  1   N  8.2    1   

   Unnamed: 12  
0          NaN  
1          NaN  
2          NaN  
3          NaN  
4          NaN  
5          NaN  
6          NaN  
7          NaN

然后您可以删除列:

In [108]:
df.drop(['USAF', 'NCDC'], 1,inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8 entries, 0 to 7
Data columns (total 11 columns):
Date           8 non-null int64
HrMn           8 non-null int64
I              8 non-null int64
Type           8 non-null object
QCP            0 non-null float64
Dir            8 non-null int64
Q              8 non-null int64
I.1            8 non-null object
Spd            8 non-null float64
Q.1            8 non-null int64
Unnamed: 12    0 non-null float64
dtypes: float64(3), int64(6), object(2)
memory usage: 768.0+ bytes