我刚发现在查询方法中解析了关键字columns
,但使用它的正确方法是什么?我没有在doc中找到解释。
>>> df = pd.DataFrame(np.arange(6).reshape(2,3))
>>> df
0 1 2
0 0 1 2
1 3 4 5
>>> df.query("columns >= 0")
0 1 2
0 0 1 2
1 3 4 5
>>> print df.query("columns >= 2")
Empty DataFrame
Columns: [0, 1, 2]
Index: []
#with unnamed columns, 'columns" seems to behave like "index"
>>> df2 = pd.DataFrame(np.arange(6).reshape(2,3), columns = ['a','b','c'])
>>> df2
a b c
0 0 1 2
1 3 4 5
>>> df2.query('columns >= 2')
IndexingError: Unalignable boolean Series key provided
答案 0 :(得分:1)
DataFrame.query
相当于第一次调用DataFrame.eval
,然后使用评估结果来索引原始DataFrame。
In [9]: idx = df.eval('columns >= 1')
In [10]: idx
Out[10]:
0 False
1 True
2 True
dtype: bool
In [11]: df.loc[idx]
Out[11]:
0 1 2
1 3 4 5
此处'columns'
相当于df.columns
,eval
的结果是索引为df.columns
的序列号,因此在列名称时,eval
的结果1}}不能用作原始DataFrame的索引。
In [13]: idx2 = df2.eval('columns >= 1')
In [14]: idx2
Out[14]:
a True
b True
c True
dtype: bool
实际上,在'columns'
表达式中使用query
是很糟糕的。在第一个例子中,只是发生了返回的Series可以用作索引,但它可能不适用于一般情况。
例如,简单更改DataFrame的形状会导致错误。
In [15]: df3 = pd.DataFrame(np.arange(6).reshape(3,2))
In [16]: df3.query('columns >= 1')
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-26-75248672c992> in <module>()
1 df = pd.DataFrame(np.arange(6).reshape(3,2))
----> 2 df.query('columns >= 1')
/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in query(self, expr, **kwargs)
1936
1937 try:
-> 1938 return self.loc[res]
1939 except ValueError:
1940 # when res is multi-dimensional loc raises, but this is sometimes a
/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
1187 return self._getitem_tuple(key)
1188 else:
-> 1189 return self._getitem_axis(key, axis=0)
1190
1191 def _getitem_axis(self, key, axis=0):
/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
1304 return self._get_slice_axis(key, axis=axis)
1305 elif is_bool_indexer(key):
-> 1306 return self._getbool_axis(key, axis=axis)
1307 elif is_list_like_indexer(key):
1308
/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getbool_axis(self, key, axis)
1194 def _getbool_axis(self, key, axis=0):
1195 labels = self.obj._get_axis(axis)
-> 1196 key = check_bool_indexer(labels, key)
1197 inds, = key.nonzero()
1198 try:
/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in check_bool_indexer(ax, key)
1657 mask = com.isnull(result.values)
1658 if mask.any():
-> 1659 raise IndexingError('Unalignable boolean Series key provided')
1660
1661 result = result.astype(bool).values
IndexingError: Unalignable boolean Series key provided