我是熊猫的新手,被卡住了一点。你能帮我吗?
我有一个存储订单的数据框:
| item | store_status | customer_status |
|------|--------------|-----------------|
| A | 'dispatched' | 'received' |
| A | 'dispatched' | 'pending' |
| B | 'pending' | 'pending' |
| B | 'dispatched' | 'received' |
| B | 'dispatched' | 'pending' |
我想创建一个新的数据框,以显示每个项目的哪一部分是“已分配”和“已接收”的。因此结果将是:
| item | dispatched_and_received |
|------|-------------------------|
| A | 0.5 |
| B | 0.33 |
我也对每个项目中“已分派”的部分都感兴趣,无论客户状态如何,都希望将其作为新列添加到此数据框中:
| item | dispatched_and_received | dispatched |
|------|-------------------------|------------|
| A | 0.5 | 1.00 |
| B | 0.33 | 0.66 |
谢谢!
答案 0 :(得分:2)
创建检查条件的布尔系列,然后取每个组中这些系列的平均值。
(df.assign(dispatched=df.store_status.eq('dispatched'),
dispatched_and_received=(df.store_status.eq('dispatched')
& df.customer_status.eq('received')))
.groupby('item')[['dispatched', 'dispatched_and_received']]
.mean()
.reset_index())
# item dispatched dispatched_and_received
#0 A 1.000000 0.500000
#1 B 0.666667 0.333333
分配仅创建列,如果所有链接看起来都有些混乱,您可以在上方手动将其拆分。等效于:
df['dispatched'] = df.store_status.eq('dispatched')
df['dispatched_and_received'] = df['dispatched'] & df.customer_status.eq('received')
这是assign
item store_status customer_status dispatched dispatched_and_received
0 A dispatched received True True
1 A dispatched pending True False
2 B pending pending False False
3 B dispatched received True True
4 B dispatched pending True False