我解析了大量的xml文件以获得一个pandas数据框。我现在需要删除一些列以进行数据分析。我无法在先前的问题中找到确切的错误。我用过-
data = data["Rig Mode","Bit on Bottom","Block Position","Block Velocity",..]
并收到一条错误消息(完整的错误消息在帖子结尾)-
KeyError: 'Key length (22) exceeds index depth (2)'
因此,我研究并转到了this post,其中提到了与lexsort depth
相关的错误,而我的错误恰恰如上所述。我根据上面的帖子对索引进行了排序-
`data = data.sort_index(level=1)`
pd.__version__
'0.22.0'
Python version - 3.6.4
并得到完全相同的错误。在下面,我获取了我的多索引详细信息-
data.columns
#MultiIndex(levels=[['Bit on Bottom','Block Position', 'Block Velocity', 'Rig Mode',...], ['', '1/min', 'L/min', 'dega', ...]],
labels=[[38, 0, 2, 22, ...]],
names=['Description', 'Unit'])
这是我在准备数据帧时构建多索引的方式,现在将列标题解析为数据集中的行-
data.columns = pd.MultiIndex.from_arrays([data.iloc[0],data.iloc[1]], names = ['Description','Unit'])
data=data.iloc[2:]
#### complete error message:
> --------------------------------------------------------------------------- KeyError Traceback (most recent call
> last) <ipython-input-119-60ad57c2383f> in <module>()
> 3 "Continuous Survey Depth","Pump 1 Stroke Rate","Pump 2 Stroke Rate","Pump 3 Stroke Rate",
> 4 "Average Standpipe Pressure","Slips stat (1=Out,0=In)", "Weight on Bit","Mud Flow
> In","Time","Average Surface Torque",
> ----> 5 "MWD Turbine RPM"]
>
> ~\Anaconda3\lib\site-packages\pandas\core\frame.py in
> __getitem__(self, key) 2135 return self._getitem_frame(key) 2136 elif is_mi_columns:
> -> 2137 return self._getitem_multilevel(key) 2138 else: 2139 return self._getitem_column(key)
>
> ~\Anaconda3\lib\site-packages\pandas\core\frame.py in
> _getitem_multilevel(self, key) 2179 2180 def _getitem_multilevel(self, key):
> -> 2181 loc = self.columns.get_loc(key) 2182 if isinstance(loc, (slice, Series, np.ndarray, Index)): 2183
> new_columns = self.columns[loc]
>
> ~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in
> get_loc(self, key, method) 2076 if self.nlevels < keylen:
> 2077 raise KeyError('Key length ({0}) exceeds index depth
> ({1})'
> -> 2078 ''.format(keylen, self.nlevels)) 2079 2080 if keylen == self.nlevels and self.is_unique:
>
> KeyError: 'Key length (22) exceeds index depth (2)'
答案 0 :(得分:1)
要选择列的子集,必须使用[[ ]]
:
data = data[["Rig Mode","Bit on Bottom","Block Position","Block Velocity",..]]
__getindex__
重载了很多。
In [11]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=["A", "B"])
In [12]: df
Out[12]:
A B
0 1 2
1 3 4
2 5 6
In [13]: df["A"]
Out[13]:
0 1
1 3
2 5
Name: A, dtype: int64
In [14]: df["A", "B"]
KeyError: ('A', 'B')
使用MultiIndex尝试选择列:
In [21]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=[["A", "AA"], ["B", "BB"]])
In [22]: df
Out[22]:
A AA
B BB
0 1 2
1 3 4
2 5 6
In [23]: df["A"]
Out[23]:
B
0 1
1 3
2 5
In [24]: df["A", "B"]
Out[24]:
0 1
1 3
2 5
Name: (A, B), dtype: int64