熊猫总和之间的差异

时间:2018-10-11 04:37:54

标签: pandas pandas-groupby

DF:

    fruits     date      amount
0   Apple   2018-01-01   100
1   Orange  2018-01-01   200
2   Apple   2018-01-01   150
3   Apple   2018-01-02   100
4   Orange  2018-01-02   100
5   Orange  2018-01-02   100

创建此代码的代码:

f = [["Apple","2018-01-01",100],["Orange","2018-01-01",200],["Apple","2018-01-01",150],
 ["Apple","2018-01-02",100],["Orange","2018-01-02",100],["Orange","2018-01-02",100]]
df = pd.DataFrame(f,columns = ["fruits","date","amount"])

我正在尝试汇总每个日期的水果销售量,并找出总和之间的差异

预期操作:

date          diff
2018-01-01 .   50 
2018-01-02 .  -100 

在查找苹果和橙子的销售额之和,并找出两者之差

我能够找到总和:

df.groupby(["date","fruits"])["amount"].agg("sum") 

   date        fruits
 2018-01-01    Apple     250
               Orange    200
 2018-01-02    Apple     100
               Orange    200
  Name: amount, dtype: int64

关于如何发现熊猫本身差异的任何建议。

3 个答案:

答案 0 :(得分:1)

a

输出

b

答案 1 :(得分:1)

添加unstack进行整形,然后用pop减去以提取列:

df = df.groupby(["date","fruits"])["amount"].sum().unstack()
df['diff'] = df.pop('Apple') - df.pop('Orange')
print (df)
fruits      diff
date            
2018-01-01    50
2018-01-02  -100

答案 2 :(得分:0)

groupby用作date applylambda function

df.groupby("date").apply(lambda x: x.loc[x['fruits']=='Apple','amount'].sum() - 
                                   x.loc[x['fruits']=='Orange','amount'].sum())

date
2018-01-01     50
2018-01-02   -100
dtype: int64

或将水果分别分组并找出差异:

A = df[df.fruits.isin(['Apple'])].groupby('date')['amount'].sum()
O = df[df.fruits.isin(['Orange'])].groupby('date')['amount'].sum()

O-A
date
2018-01-01    -50
2018-01-02    100
Name: amount, dtype: int64