在groupby之后减去

时间:2015-12-01 01:10:06

标签: python pandas

我有一个这样的数据框:

   Allotment    Date        NDII_Mean
   Arnstson     19900619    0.073023
   A_Annex      19900619    0.131290
   Arnstson     19900620    0.045553
   A_Annex      19900620    0.688850

我希望按Allotment进行分组,然后按19900620个日期减去19900619个日期。

我希望我的输出看起来像这样:

Allotment      NDII_Mean
Arnstson       -0.02747
A_Annex         0.55756 

2 个答案:

答案 0 :(得分:1)

difference = lambda x: ['x['Allotment'][0], x.ix[1]['NDII_Mean'] - x.ix[0]['NDII_Mean']]
df_diffs = DataFrame([difference(x[1].reset_index(drop = True)) for x in df.groupby(['Allotment'])])
df_diffs.columns = ['Allotment', 'NDII_Mean']
print df_diffs

  Allotment  NDII_Mean
0   A_Annex    0.55756
1  Arnstson   -0.02747

答案 1 :(得分:1)

您可以使用reshape strategies (pivot),以便自然地减去结果。

df = pd.DataFrame([['Arnstson' ,   19900619 ,  0.073023],
                   ['A_Annex'  ,   19900619 ,  0.131290],
                   ['Arnstson' ,   19900620 ,  0.045553],
                   ['A_Annex'  ,   19900620 ,  0.688850]],
                 columns=['Allotment', 'Date', 'NDII_Mean'])
dfreshape = df.pivot('Allotment', 'Date')    
#           NDII_Mean          
# Date       19900619  19900620
# Allotment                    
# A_Annex    0.131290  0.688850
# Arnstson   0.073023  0.045553    

然后您可以简单地使用索引/切片来获得所需的结果:

dfreshape['NDII_Mean',19900620] - dfreshape['NDII_Mean',19900619]
# Allotment
# A_Annex     0.55756
# Arnstson   -0.02747
# dtype: float64

完整代码:

dfreshape = df.pivot('Allotment', 'Date')   
dfreshape['NDII_Mean',19900620] - dfreshape['NDII_Mean',19900619]