我有这个数据框:
bal:
year id unit period Revenues Ativo Não-Circulante \
business_id
9564 2012 302 dsada anual 5964168.52 10976013.70
9564 2011 303 dsada anual 5774707.15 10867868.13
2361 2013 304 dsada anual 3652575.31 6608468.52
2361 2012 305 dsada anual 321076.15 6027066.03
2361 2011 306 dsada anual 3858137.49 9733126.02
2369 2012 307 dsada anual 351373.66 9402830.89
8104 2012 308 dsada anual 3503226.02 6267307.01
...
我想创建一个名为" Growth"的列。它将是:
(今年的收入/去年的收入) - 1
数据框应如下所示:
year id unit period Revenues Growth \
business_id
9564 2012 302 dsada anual 5964168.52 0.0328
9564 2011 303 dsada anual 5774707.15 NaN
2361 2013 304 dsada anual 3652575.31 10.37
2361 2012 305 dsada anual 321076.15 -0.91
2361 2011 306 dsada anual 3858137.49 NaN
2369 2012 307 dsada anual 351373.66 NaN
8104 2012 308 dsada anual 3503226.02 NaN
...
我怎么能这样做?
答案 0 :(得分:1)
我假设您的数据框名为df
。首先休息您的索引,以便business_id
是一列,然后在year
上对结果进行排序。现在将数据框分组到business_id
并转换结果以获得收入的百分比变化。最后,使用索引获取原始订单。
df2 = df.reset_index().sort_values(['year'])
df2 = (
df2
.assign(Growth=df2.groupby(['business_id'])['Revenues'].transform(
lambda group: group.pct_change()))
.sort_index()
)
>>> df2
business_id year id unit period Revenues Ativo Não-Circulante Growth
0 9564 2012 302 dsada anual 5964168.52 10976013.70 0.032809
1 9564 2011 303 dsada anual 5774707.15 10867868.13 NaN
2 2361 2013 304 dsada anual 3652575.31 6608468.52 10.376041
3 2361 2012 305 dsada anual 321076.15 6027066.03 -0.916779
4 2361 2011 306 dsada anual 3858137.49 9733126.02 NaN
5 2369 2012 307 dsada anual 351373.66 9402830.89 NaN
6 8104 2012 308 dsada anual 3503226.02 6267307.01 NaN
我认为您的预期输出有误:
5964168.52 / 5774707.15 - 1 = 0.0328 # vs. 0.16 shown.
答案 1 :(得分:0)
你需要" groupby"年和" sort_values"按年通过groupby值循环计算增长,将增长存储在列表中并转换为numpy.array(增长),添加到数据帧。
#df is your dataframe
group = df.groupby(df['year'])
R = {} #Store Revenue in dictionary
y = [] #make list of year to append years
for year, values in group:
R[year] = values[Revenues]
y.append(year)
g = [] #create list of growth
for i, eyear in enumerate(y):
try:
g.append(eyear[i]/eyear[i+1])
except:
pass
df['Growth'] = numpy.array(g) #Create numpy array and append to df
答案 2 :(得分:0)
您需要groupby('business_id')
,然后shift
才能获得去年的收入。将其保存到新列,然后执行比率,如下所示:
df.reset_index(inplace=True) # You might have to do this because it looks like your index is 'business_id'
df['Previous Revenues'] = df.sort_values('year').groupby('business_id')['Revenues'].shift(1)
df['Growth'] = df['Revenues']/df['Previous Revenues'] - 1
如果您愿意,您不需要保存新列,但该行有点乱:
df['Growth'] = df['Revenues']/df.sort_values('year').groupby('business_id')['Revenues'].shift(1) - 1