如何通过按某些字段(在示例“国家”和“行业”中)分组并将数学应用于另一个字段(在示例“字段”和“值”中)来从现有DataFrame创建新行?
源数据框架
df = pd.DataFrame({'Country': ['USA','USA','USA','USA','USA','USA','Canada','Canada'],
'Industry': ['Finance', 'Finance', 'Retail',
'Retail', 'Energy', 'Energy',
'Retail', 'Retail'],
'Field': ['Import', 'Export','Import',
'Export','Import', 'Export',
'Import', 'Export'],
'Value': [100, 50, 80, 10, 20, 5, 30, 10]})
Country Industry Field Value
0 USA Finance Import 100
1 USA Finance Export 50
2 USA Retail Import 80
3 USA Retail Export 10
4 USA Energy Import 20
5 USA Energy Export 5
6 Canada Retail Import 30
7 Canada Retail Export 10
目标数据框
净额=导入-导出
Country Industry Field Value
0 USA Finance Net 50
1 USA Retail Net 70
2 USA Energy Net 15
3 Canada Retail Net 20
答案 0 :(得分:8)
可能有很多方法。这是使用groupby
和unstack
的一个:
(df.groupby(['Country', 'Industry', 'Field'], sort=False)['Value']
.sum()
.unstack('Field')
.eval('Import - Export')
.reset_index(name='Value'))
Country Industry Value
0 USA Finance 50
1 USA Retail 70
2 USA Energy 15
3 Canada Retail 20
答案 1 :(得分:4)
IIUC
df=df.set_index(['Country','Industry'])
Newdf=(df.loc[df.Field=='Export','Value']-df.loc[df.Field=='Import','Value']).reset_index().assign(Field='Net')
Newdf
Country Industry Value Field
0 USA Finance -50 Net
1 USA Retail -70 Net
2 USA Energy -15 Net
3 Canada Retail -20 Net
pivot_table
df.pivot_table(index=['Country','Industry'],columns='Field',values='Value',aggfunc='sum').\
diff(axis=1).\
dropna(1).\
rename(columns={'Import':'Value'}).\
reset_index()
Out[112]:
Field Country Industry Value
0 Canada Retail 20.0
1 USA Energy 15.0
2 USA Finance 50.0
3 USA Retail 70.0
答案 2 :(得分:2)
您可以使用Groupby.diff()
,然后重新创建Field
列,最后使用DataFrame.dropna
:
df['Value'] = df.groupby(['Country', 'Industry'])['Value'].diff().abs()
df['Field'] = 'Net'
df.dropna(inplace=True)
df.reset_index(drop=True, inplace=True)
print(df)
Country Industry Field Value
0 USA Finance Net 50.0
1 USA Retail Net 70.0
2 USA Energy Net 15.0
3 Canada Retail Net 20.0
答案 3 :(得分:2)
您可以通过这种方式将这些行添加到原始数据框中:
df.set_index(['Country','Industry','Field'])\
.unstack()['Value']\
.eval('Net = Import - Export')\
.stack().rename('Value').reset_index()
输出:
Country Industry Field Value
0 Canada Retail Export 10
1 Canada Retail Import 30
2 Canada Retail Net 20
3 USA Energy Export 5
4 USA Energy Import 20
5 USA Energy Net 15
6 USA Finance Export 50
7 USA Finance Import 100
8 USA Finance Net 50
9 USA Retail Export 10
10 USA Retail Import 80
11 USA Retail Net 70
答案 4 :(得分:2)
此答案利用了以下事实:熊猫将组键放在结果数据帧的多索引中。 (如果只有一个组密钥,则可以使用loc
。)
>>> s = df.groupby(['Country', 'Industry', 'Field'])['Value'].sum()
>>> s.xs('Import', axis=0, level='Field') - s.xs('Export', axis=0, level='Field')
Country Industry
Canada Retail 20
USA Energy 15
Finance 50
Retail 70
Name: Value, dtype: int64