我有一个pandas数据帧df(参见示例),其中包含以下列:master ID
,sub ID
(子master ID
),子ID的价格和每个主ID的方案计数/ SUBID。我想在示例中显示的每个主ID的方案之间找到价格pct_change。主ID价格是其子子ID价格的总和。对不起,如果不清楚。
df.sort_values(by="scenario", ascending=False).groupby('subID').head(n=1)
df和期望结果的示例
答案 0 :(得分:0)
试试这个:
In [38]: df
Out[38]:
id master_id price scenario
0 101 1 400 1
1 102 1 550 1
2 101 1 650 2
3 102 1 400 2
4 201 2 500 1
5 201 2 600 2
6 301 2 500 3
7 301 2 600 3
In [39]: g = df.groupby(['master_id','scenario'], as_index=False)['price']\
....: .sum()\
....: .pivot(index='master_id', columns='scenario', values='price')\
....: .reset_index()
In [40]: g
Out[40]:
scenario master_id 1 2 3
0 1 950.0 1050.0 NaN
1 2 500.0 600.0 1100.0
答案 1 :(得分:0)
因此,第一步是按主ID和方案分组,然后取消堆栈方案以获取每个列的列:
df_scen = df.groupby(['master id', 'scenario']).agg({'price': 'sum'}).unstack('scenario')
下一步是为%更改创建一个新列。您可能需要稍微使用列名称,但它应该如下所示:
df_scen['% change'] = 100 * df_scen[('scenario', 2)] / df_scen[('scenario', 1)] - 100
如果您有每个主ID的多个方案,并且您只想要最新的两个:
# Group and sort in descending order of scenario
grp = df.sort_values(['master id', 'scenario'], ascending=False).groupby('master id')
# Get only the latest and previous scenarios
df_first = grp.nth(0).reset_index() # synonymous to .first()
df_first['rev_scen'] = 'current'
df_second = grp.nth(1).reset_index()
df_second['rev_scen'] = 'previous'
df_latest = df_first.append(df_second) # Merge the two
# From here on it's basically the same as the original answer
df_scen = df_latest.groupby(['master id', 'rev_scen']).agg({'price': 'sum'}).unstack('rev_scen')
df_scen['% change'] = 100 * df_scen[('price', 'current')] / df_scen[('price', 'previous')] - 100