Pandas DataFrames:使用现有行中的计算来创建新行

时间:2019-04-13 21:53:11

标签: python pandas dataframe

如何通过按某些字段(在示例“国家”和“行业”中)分组并将数学应用于另一个字段(在示例“字段”和“值”中)来从现有DataFrame创建新行?

源数据框架

df = pd.DataFrame({'Country': ['USA','USA','USA','USA','USA','USA','Canada','Canada'],
                   'Industry': ['Finance', 'Finance', 'Retail', 
                                'Retail', 'Energy', 'Energy', 
                                'Retail', 'Retail'],
                   'Field': ['Import', 'Export','Import', 
                             'Export','Import', 'Export',
                             'Import', 'Export'],
                   'Value': [100, 50, 80, 10, 20, 5, 30, 10]})

    Country Industry    Field   Value
0   USA     Finance     Import  100
1   USA     Finance     Export  50
2   USA     Retail      Import  80
3   USA     Retail      Export  10
4   USA     Energy      Import  20
5   USA     Energy      Export  5
6   Canada  Retail      Import  30
7   Canada  Retail      Export  10

目标数据框

净额=导入-导出

    Country Industry    Field   Value
0   USA     Finance     Net     50
1   USA     Retail      Net     70
2   USA     Energy      Net     15
3   Canada  Retail      Net     20

5 个答案:

答案 0 :(得分:8)

可能有很多方法。这是使用groupbyunstack的一个:

(df.groupby(['Country', 'Industry', 'Field'], sort=False)['Value']
   .sum()
   .unstack('Field')
   .eval('Import - Export')
   .reset_index(name='Value'))

  Country Industry  Value
0     USA  Finance     50
1     USA   Retail     70
2     USA   Energy     15
3  Canada   Retail     20

答案 1 :(得分:4)

IIUC

df=df.set_index(['Country','Industry'])

Newdf=(df.loc[df.Field=='Export','Value']-df.loc[df.Field=='Import','Value']).reset_index().assign(Field='Net')
Newdf
  Country Industry  Value Field
0     USA  Finance    -50   Net
1     USA   Retail    -70   Net
2     USA   Energy    -15   Net
3  Canada   Retail    -20   Net

pivot_table

df.pivot_table(index=['Country','Industry'],columns='Field',values='Value',aggfunc='sum').\
  diff(axis=1).\
     dropna(1).\
        rename(columns={'Import':'Value'}).\
          reset_index()
Out[112]: 
Field Country Industry  Value
0      Canada   Retail   20.0
1         USA   Energy   15.0
2         USA  Finance   50.0
3         USA   Retail   70.0

答案 2 :(得分:2)

您可以使用Groupby.diff(),然后重新创建Field列,最后使用DataFrame.dropna

df['Value'] = df.groupby(['Country', 'Industry'])['Value'].diff().abs()
df['Field'] = 'Net'
df.dropna(inplace=True)
df.reset_index(drop=True, inplace=True)

print(df)
  Country Industry Field  Value
0     USA  Finance   Net   50.0
1     USA   Retail   Net   70.0
2     USA   Energy   Net   15.0
3  Canada   Retail   Net   20.0

答案 3 :(得分:2)

您可以通过这种方式将这些行添加到原始数据框中:

df.set_index(['Country','Industry','Field'])\
  .unstack()['Value']\
  .eval('Net = Import - Export')\
  .stack().rename('Value').reset_index()

输出:

   Country Industry   Field  Value
0   Canada   Retail  Export     10
1   Canada   Retail  Import     30
2   Canada   Retail     Net     20
3      USA   Energy  Export      5
4      USA   Energy  Import     20
5      USA   Energy     Net     15
6      USA  Finance  Export     50
7      USA  Finance  Import    100
8      USA  Finance     Net     50
9      USA   Retail  Export     10
10     USA   Retail  Import     80
11     USA   Retail     Net     70

答案 4 :(得分:2)

此答案利用了以下事实:熊猫将组键放在结果数据帧的多索引中。 (如果只有一个组密钥,则可以使用loc。)

>>> s = df.groupby(['Country', 'Industry', 'Field'])['Value'].sum()
>>> s.xs('Import', axis=0, level='Field') - s.xs('Export', axis=0, level='Field')
Country  Industry
Canada   Retail      20
USA      Energy      15
         Finance     50
         Retail      70
Name: Value, dtype: int64