熊猫与条件共享价值并添加新列

时间:2020-04-07 14:26:26

标签: pandas dataframe pandas-groupby

我是熊猫的新手,被卡住了一点。你能帮我吗?

我有一个存储订单的数据框:

| item | store_status | customer_status |
|------|--------------|-----------------|
| A    | 'dispatched' | 'received'      |
| A    | 'dispatched' | 'pending'       |
| B    | 'pending'    | 'pending'       |
| B    | 'dispatched' | 'received'      |
| B    | 'dispatched' | 'pending'       |

我想创建一个新的数据框,以显示每个项目的哪一部分是“已分配”和“已接收”的。因此结果将是:

| item | dispatched_and_received |
|------|-------------------------|
| A    | 0.5                     |
| B    | 0.33                    |

我也对每个项目中“已分派”的部分都感兴趣,无论客户状态如何,都希望将其作为新列添加到此数据框中:

| item | dispatched_and_received | dispatched |
|------|-------------------------|------------|
| A    | 0.5                     | 1.00       |
| B    | 0.33                    | 0.66       |

谢谢!

1 个答案:

答案 0 :(得分:2)

创建检查条件的布尔系列,然后取每个组中这些系列的平均值。

(df.assign(dispatched=df.store_status.eq('dispatched'),
           dispatched_and_received=(df.store_status.eq('dispatched')
                                    & df.customer_status.eq('received')))
   .groupby('item')[['dispatched', 'dispatched_and_received']]
   .mean()
   .reset_index())

#  item  dispatched  dispatched_and_received
#0    A    1.000000                 0.500000
#1    B    0.666667                 0.333333

分配仅创建列,如果所有链接看起来都有些混乱,您可以在上方手动将其拆分。等效于:

df['dispatched'] = df.store_status.eq('dispatched')
df['dispatched_and_received'] = df['dispatched'] & df.customer_status.eq('received')

这是assign

之后的DataFrame
  item store_status customer_status  dispatched  dispatched_and_received
0    A   dispatched        received        True                     True
1    A   dispatched         pending        True                    False
2    B      pending         pending       False                    False
3    B   dispatched        received        True                     True
4    B   dispatched         pending        True                    False