我有一个数据框,其中记录了每日销售额。我需要知道增长最快的产品。对于例如在这个例子中,1月22日至23日期间的冰淇淋销售是所有产品中最高的。
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
myst="""
20-01-17 pizza 90
21-01-17 pizza 120
22-01-17 pizza 239
23-01-17 pizza 200
20-01-17 fried-rice 100
21-01-17 fried-rice 120
22-01-17 fried-rice 110
23-01-17 fried-rice 190
20-01-17 ice-cream 8
21-01-17 ice-cream 23
22-01-17 ice-cream 21
23-01-17 ice-cream 100
"""
u_cols=['date', 'product', 'sales']
这就是我创建数据框的方式:
myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep='\t', names = u_cols)
在电子表格中看起来像这样。大熊猫将如何处理它?</ p>
答案 0 :(得分:2)
我认为你需要pct_change
:
df['new'] = df.groupby('product')['sales'].pct_change().mul(100)
print (df)
date product sales new
0 20-01-17 pizza 90 NaN
1 21-01-17 pizza 120 33.333333
2 22-01-17 pizza 239 99.166667
3 23-01-17 pizza 200 -16.317992
4 20-01-17 fried-rice 100 NaN
5 21-01-17 fried-rice 120 20.000000
6 22-01-17 fried-rice 110 -8.333333
7 23-01-17 fried-rice 190 72.727273
8 20-01-17 ice-cream 8 NaN
9 21-01-17 ice-cream 23 187.500000
10 22-01-17 ice-cream 21 -8.695652
11 23-01-17 ice-cream 100 376.190476
a = df.groupby('product')['sales'].pct_change().idxmax()
print (a)
11
b = 'sale: {}, during: from {} to {}'.format(df.loc[a, 'product'],
df.loc[a-1, 'date'],
df.loc[a, 'date'])
print (b)
sale: ice-cream, during: from 22-01-17 to 23-01-17