以下是我的数据的一部分:
USAF, NCDC, Date, HrMn, I, Type, QCP, Dir, Q, I, Spd, Q,
034820,99999,19490801,0000,4, SAO, ,270,1,N, 5.7,1,
034820,99999,19490801,0100,4, SAO, ,270,1,N, 4.6,1,
034820,99999,19490801,0200,4, SAO, ,270,1,N, 4.6,1,
034820,99999,19490801,0300,4, SAO, ,270,1,N, 4.6,1,
034820,99999,19490801,0400,4, SAO, ,270,1,N, 6.7,1,
034820,99999,19490801,0500,4, SAO, ,270,1,N, 8.7,1,
034820,99999,19490801,0600,4, SAO, ,270,1,N, 8.2,1,
034820,99999,19490801,0700,4, SAO, ,270,1,N, 8.2,1,
我尝试删除数据,但报告某些列不存在。
ipath= "C:\Users\Administrator\Desktop\科研风速数据资料\Marham\\marham.txt"
uipath = unicode(ipath , "utf8")
file2 = open(uipath)
df = pd.read_csv(uipath,header=0)
# I can drop USAF, but unable to drop NCDC, I, Q,
df.drop(['USAF', 'NCDC'], 1,inplace=True)
df.describe()
报告:
ValueError: labels ['NCDC'] not contained in axis
我的代码出了什么问题?我过去常常使用df.drop(['a', 'b', 'c'], 1,inplace=True)
。
更新:
df.columns.tolist()
['USAF',
' NCDC',
' Date',
' HrMn',
' I',
' Type',
' QCP',
' Dir',
' Q',
' I.1',
' Spd',
' Q',
' ']
答案 0 :(得分:2)
read_csv
的默认分隔符是逗号,但您的数据包含空格,这会在标题和数据中引入前导空格,如果您传递skipinitialspace=True
,则导入正常:
In [106]:
import io
import pandas as pd
t="""USAF, NCDC, Date, HrMn, I, Type, QCP, Dir, Q, I, Spd, Q,
034820,99999,19490801,0000,4, SAO, ,270,1,N, 5.7,1,
034820,99999,19490801,0100,4, SAO, ,270,1,N, 4.6,1,
034820,99999,19490801,0200,4, SAO, ,270,1,N, 4.6,1,
034820,99999,19490801,0300,4, SAO, ,270,1,N, 4.6,1,
034820,99999,19490801,0400,4, SAO, ,270,1,N, 6.7,1,
034820,99999,19490801,0500,4, SAO, ,270,1,N, 8.7,1,
034820,99999,19490801,0600,4, SAO, ,270,1,N, 8.2,1,
034820,99999,19490801,0700,4, SAO, ,270,1,N, 8.2,1,"""
df = pd.read_csv(io.StringIO(t), skipinitialspace=True)
df
Out[106]:
USAF NCDC Date HrMn I Type QCP Dir Q I.1 Spd Q.1 \
0 34820 99999 19490801 0 4 SAO NaN 270 1 N 5.7 1
1 34820 99999 19490801 100 4 SAO NaN 270 1 N 4.6 1
2 34820 99999 19490801 200 4 SAO NaN 270 1 N 4.6 1
3 34820 99999 19490801 300 4 SAO NaN 270 1 N 4.6 1
4 34820 99999 19490801 400 4 SAO NaN 270 1 N 6.7 1
5 34820 99999 19490801 500 4 SAO NaN 270 1 N 8.7 1
6 34820 99999 19490801 600 4 SAO NaN 270 1 N 8.2 1
7 34820 99999 19490801 700 4 SAO NaN 270 1 N 8.2 1
Unnamed: 12
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
然后您可以删除列:
In [108]:
df.drop(['USAF', 'NCDC'], 1,inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 8 entries, 0 to 7
Data columns (total 11 columns):
Date 8 non-null int64
HrMn 8 non-null int64
I 8 non-null int64
Type 8 non-null object
QCP 0 non-null float64
Dir 8 non-null int64
Q 8 non-null int64
I.1 8 non-null object
Spd 8 non-null float64
Q.1 8 non-null int64
Unnamed: 12 0 non-null float64
dtypes: float64(3), int64(6), object(2)
memory usage: 768.0+ bytes