pandas查询方法中关键字'columns'的含义是什么?

时间:2015-08-08 13:06:17

标签: python pandas

我刚发现在查询方法中解析了关键字columns,但使用它的正确方法是什么?我没有在doc中找到解释。

>>> df = pd.DataFrame(np.arange(6).reshape(2,3)) 
>>> df
   0  1  2
0  0  1  2
1  3  4  5

>>> df.query("columns >= 0")

   0  1  2
0  0  1  2
1  3  4  5

>>> print df.query("columns >= 2")
Empty DataFrame
Columns: [0, 1, 2]
Index: []

#with unnamed columns, 'columns" seems to behave like "index"

>>> df2 = pd.DataFrame(np.arange(6).reshape(2,3), columns = ['a','b','c'])
>>> df2
   a  b  c
0  0  1  2
1  3  4  5

>>> df2.query('columns >= 2')
IndexingError: Unalignable boolean Series key provided

1 个答案:

答案 0 :(得分:1)

DataFrame.query相当于第一次调用DataFrame.eval,然后使用评估结果来索引原始DataFrame。

In [9]: idx = df.eval('columns >= 1')

In [10]: idx
Out[10]:
0    False
1     True
2     True
dtype: bool

In [11]: df.loc[idx]
Out[11]:
   0  1  2
1  3  4  5

此处'columns'相当于df.columnseval的结果是索引为df.columns的序列号,因此在列名称时,eval的结果1}}不能用作原始DataFrame的索引。

In [13]: idx2 = df2.eval('columns >= 1')

In [14]: idx2
Out[14]:
a    True
b    True
c    True
dtype: bool

实际上,在'columns'表达式中使用query是很糟糕的。在第一个例子中,只是发生了返回的Series可以用作索引,但它可能不适用于一般情况。

例如,简单更改DataFrame的形状会导致错误。

In [15]: df3 = pd.DataFrame(np.arange(6).reshape(3,2))

In [16]: df3.query('columns >= 1')
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-26-75248672c992> in <module>()
      1 df = pd.DataFrame(np.arange(6).reshape(3,2))
----> 2 df.query('columns >= 1')

/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in query(self, expr, **kwargs)
   1936 
   1937         try:
-> 1938             return self.loc[res]
   1939         except ValueError:
   1940             # when res is multi-dimensional loc raises, but this is sometimes a

/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1187             return self._getitem_tuple(key)
   1188         else:
-> 1189             return self._getitem_axis(key, axis=0)
   1190 
   1191     def _getitem_axis(self, key, axis=0):

/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1304             return self._get_slice_axis(key, axis=axis)
   1305         elif is_bool_indexer(key):
-> 1306             return self._getbool_axis(key, axis=axis)
   1307         elif is_list_like_indexer(key):
   1308 

/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getbool_axis(self, key, axis)
   1194     def _getbool_axis(self, key, axis=0):
   1195         labels = self.obj._get_axis(axis)
-> 1196         key = check_bool_indexer(labels, key)
   1197         inds, = key.nonzero()
   1198         try:

/home/ubuntu/anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in check_bool_indexer(ax, key)
   1657         mask = com.isnull(result.values)
   1658         if mask.any():
-> 1659             raise IndexingError('Unalignable boolean Series key provided')
   1660 
   1661         result = result.astype(bool).values

IndexingError: Unalignable boolean Series key provided