我有以下pandas数据帧。每个点都与' n'每个类的类点,每个组合的值为0或1。 现在,对于每一点,我想得到具有最高数量' 0'的类。 输出: Pt.1 - a Pt.2-b
我尝试过哈希表,但它有点麻烦。什么是优雅的熊猫数据帧查询呢?
+------+-------+-------+--+--+--+
| Pt. | class | value | | | |
+------+-------+-------+--+--+--+
| Pt.1 | a | 0 | | | |
| Pt.1 | a | 0 | | | |
| Pt.1 | a | 1 | | | |
| Pt.1 | b | 0 | | | |
| Pt.1 | b | 1 | | | |
| pt.1 | b | 1 | | | |
| Pt.2 | a | 1 | | | |
| Pt.2 | a | 1 | | | |
| Pt.2 | a | 1 | | | |
| Pt.2 | b | 0 | | | |
| Pt.2 | b | 0 | | | |
| Pt.2 | b | 0 | | | |
| | | | | | |
+------+-------+-------+--+--+--+
答案 0 :(得分:1)
首先按boolean indexing
仅过滤0
行,然后按groupby
计算value_counts
对输出进行排序,因此必须通过索引来获取第一个index
值:
df = (df[df['value'] == 0].groupby('Pt.')['class']
.apply(lambda x: x.value_counts().index[0])
.reset_index(name='top1'))
print (df)
Pt. top1
0 Pt.1 a
1 Pt.2 b
使用query
进行过滤的类似替代方法:
df = (df.query("value == 0")
.groupby('Pt.')['class']
.apply(lambda x: x.value_counts().index[0])
.reset_index(name='top1'))
print (df)
Pt. top1
0 Pt.1 a
1 Pt.2 b