找到最高累积百分比变化

时间:2018-01-22 08:31:25

标签: pandas

我有一个数据框,其中记录了每日销售额。我需要知道增长最快的产品。对于例如在这个例子中,1月22日至23日期间的冰淇淋销售是所有产品中最高的。

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

myst="""
20-01-17    pizza   90
21-01-17    pizza   120
22-01-17    pizza   239
23-01-17    pizza   200
20-01-17    fried-rice  100
21-01-17    fried-rice  120
22-01-17    fried-rice  110
23-01-17    fried-rice  190
20-01-17    ice-cream   8
21-01-17    ice-cream   23
22-01-17    ice-cream   21
23-01-17    ice-cream   100
"""
u_cols=['date', 'product', 'sales']

这就是我创建数据框的方式:

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\t', names = u_cols)

在电子表格中看起来像这样。大熊猫将如何处理它?<​​/ p>

highest_sales using pandas

1 个答案:

答案 0 :(得分:2)

我认为你需要pct_change

df['new'] = df.groupby('product')['sales'].pct_change().mul(100)
print (df)
        date     product  sales         new
0   20-01-17       pizza     90         NaN
1   21-01-17       pizza    120   33.333333
2   22-01-17       pizza    239   99.166667
3   23-01-17       pizza    200  -16.317992
4   20-01-17  fried-rice    100         NaN
5   21-01-17  fried-rice    120   20.000000
6   22-01-17  fried-rice    110   -8.333333
7   23-01-17  fried-rice    190   72.727273
8   20-01-17   ice-cream      8         NaN
9   21-01-17   ice-cream     23  187.500000
10  22-01-17   ice-cream     21   -8.695652
11  23-01-17   ice-cream    100  376.190476

a = df.groupby('product')['sales'].pct_change().idxmax()
print (a)
11

b = 'sale: {}, during: from {} to {}'.format(df.loc[a, 'product'], 
                                            df.loc[a-1, 'date'],
                                            df.loc[a, 'date'])
print (b)
sale: ice-cream, during: from 22-01-17 to 23-01-17