从数据框

时间:2017-11-25 00:14:09

标签: python pandas

我正在尝试创建一个数据框(df),该数据框从另一个名为performanceData的数据框中按月创建时间序列的总,平均值和标准差。性能数据的主管如下:

             total_cost  
date                                                                       
2006-03-04 -1465.052092          
2006-04-04 -1213.508277     
2006-05-04 -1459.290503    
2006-06-04 -1460.119361     
2006-07-04  -772.482609

我试过使用以下内容:

def getMonthlyPerf(performanceData):

    performanceData['Year']=performanceData.index.year
    performanceData['Month']=performanceData.index.month

    df = pd.DataFrame()
    df.columns=['new','new1','new3']


    df['Sum']=performanceData.groupby(['Year','Month'])['total_p‌​nl_po‌​s'].sum()
    df['Ave']=performanceData.groupby(['Year','Month'])['total_p‌​nl_po‌​s'].mean()
    df['Std']=performanceData.groupby(['Year','Month'])['total_p‌​nl_po‌​s'].std()

    return df

但是不能让它发挥作用;我目前收到错误:

AttributeError: 'list' object has no attribute 'name'

以下是我尝试创建的输出示例:

                  sum     ave    std
Month   Year 
1        2006      123    86.32  2.32
2        2006      546    625    6.23
3        2006      654    65     6.21

2 个答案:

答案 0 :(得分:0)

选项1:Groupby

df.index = pd.to_datetime(df.index)

new_df = df.groupby([df.index.year, df.index.month]).agg(['sum', 'mean', 'std'])
new_df.index.set_names(['Year', 'Month'], inplace = True)
new_df.reset_index(inplace = True)


    Year    Month   total_cost
                         sum       mean         std
0   2006    3   -1465.052092    -1465.052092    NaN
1   2006    4   -1213.508277    -1213.508277    NaN
2   2006    5   -1459.290503    -1459.290503    NaN
3   2006    6   -1460.119361    -1460.119361    NaN
4   2006    7   -772.482609     -772.482609     NaN

选项2:使用to_period

df.index = pd.to_datetime(df.index)

df.index = df.index.to_period('M')
df.groupby(df.index).agg(['sum', 'mean', 'std']).reset_index()

答案 1 :(得分:0)

我认为您转换为列并且无需创建新的空df,只能使用df.index.yeardf.index.month

def getMonthlyPerf(performanceData):
    d = {'mean':'ave'}
    return (performanceData.groupby([performanceData.index.year,
                                    performanceData.index.month])['total_pnl_pos']
                           .agg(['sum', 'mean', 'std'])
                           .rename_axis(('Year','Month'))
                           .rename(columns=d)
                           .reset_index())

print (getMonthlyPerf(df))
   Year  Month           sum           ave          std
0  2006      3  -1465.052092  -1465.052092          NaN
1  2006      4   2660.836016    332.604502  7533.165341
2  2006      5 -11375.251280 -11375.251280          NaN
3  2006      6  19918.807750  19918.807750          NaN
4  2006      7   4926.596073   4926.596073          NaN
5  2006     10  -1533.561324  -1533.561324          NaN
6  2006     11   8161.716319   8161.716319          NaN
7  2006     12  -4679.482760  -4679.482760          NaN