Question

我有以下pandas数据帧。每个点都与＆＃39; n＆＃39;每个类的类点，每个组合的值为0或1。现在，对于每一点，我想得到具有最高数量＆＃39; 0＆＃39;的类。输出： Pt.1 - a Pt.2-b

我尝试过哈希表，但它有点麻烦。什么是优雅的熊猫数据帧查询呢？

+------+-------+-------+--+--+--+
| Pt.  | class | value |  |  |  |
+------+-------+-------+--+--+--+
| Pt.1 | a     |     0 |  |  |  |
| Pt.1 | a     |     0 |  |  |  |
| Pt.1 | a     |     1 |  |  |  |
| Pt.1 | b     |     0 |  |  |  |
| Pt.1 | b     |     1 |  |  |  |
| pt.1 | b     |     1 |  |  |  |
| Pt.2 | a     |     1 |  |  |  |
| Pt.2 | a     |     1 |  |  |  |
| Pt.2 | a     |     1 |  |  |  |
| Pt.2 | b     |     0 |  |  |  |
| Pt.2 | b     |     0 |  |  |  |
| Pt.2 | b     |     0 |  |  |  |
|      |       |       |  |  |  |
+------+-------+-------+--+--+--+

Answer 1

首先按boolean indexing仅过滤0行，然后按groupby计算value_counts对输出进行排序，因此必须通过索引来获取第一个index值：

df = (df[df['value'] == 0].groupby('Pt.')['class']
                          .apply(lambda x: x.value_counts().index[0])
                          .reset_index(name='top1'))
print (df)
    Pt. top1
0  Pt.1    a
1  Pt.2    b

使用query进行过滤的类似替代方法：

df = (df.query("value == 0")
        .groupby('Pt.')['class']
        .apply(lambda x: x.value_counts().index[0])
        .reset_index(name='top1'))
print (df)
    Pt. top1
0  Pt.1    a
1  Pt.2    b

使用pandas dataframe中的查询语句选择列

1 个答案: