Question

我是Python新手，我正试图从DataFrame中获取行/列的子集：

In [1]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

In [2]:
example=DataFrame(np.random.rand(6,5),columns=['a','b','c','d','e'])

In [3]:
example.a={2,4,6,8,10,12}

In [4]:
example

Out[4]:
    a   b   c   d   e
0   2   0.225608 0.023888 0.535053 0.953350 
1   4   0.803721 0.741708 0.256522 0.062574 
2   6   0.354936 0.597274 0.801495 0.763515 
3   8   0.204974 0.870951 0.220088 0.446273 
4   10  0.673855 0.693210 0.494213 0.842049 
5   12  0.516609 0.038669 0.972165 0.183945 

In [5]:
example[['a','b','d','e']].query('a==10')

Out[5]:
    a   b   d   e
4   10  0.673855 0.494213 0.842049 

In [6]:
example[['b','d','e']].query('a==10')

.....

UndefinedVariableError: name 'a' is not defined

第一种情况还可以，但我在第二次查询时遇到错误，你知道为什么会出现这个错误吗？非常感谢你

Answer 1

在example[['b','d','e']]中，您只有example的子集，但不包含列a。

要从['b','d','e']的行中获取值a==10，您只需要转换查询和索引。首先它查询，只返回行，然后在该行上使用索引：

In[113]: example.query('a==10')[['b','c','d']]
Out[113]: 
          b         c         d
4  0.439672  0.181699  0.770421

Answer 2

当您创建第二个选择example[['b','d','e']]时，您实际上会删除＆＃39;来自数据框：

example[['b','d','e']]
b   d   e
0   0.910757    0.565006    0.284420
1   0.601034    0.697879    0.983803
2   0.516938    0.829621    0.471825
3   0.896217    0.663177    0.093502
4   0.277488    0.796543    0.643166
5   0.594420    0.759634    0.164800

因此，您尝试访问不存在的列。换句话说，如果要从数据框中查询列，则需要在查询之前将其包含在选择中。

查询Pandas DataFrame

2 个答案: