如何转换熊猫数据框

时间:2019-01-21 11:01:56

标签: python pandas dataframe

我有一个熊猫数据框,如下所示:-

import pandas as pd
import numpy as np
from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2012, 1, 1)

index = pd.date_range(start, end)

Cols = ['Returns']



df = pd.DataFrame(abs(np.random.randn(366,1)), index=index, columns=Cols)

我需要对其进行转换,以使索引为年,列为月。预期的输出如下:-

start1 = 2011
end1 = 2012

index1 = (start, end)
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df_out = pd.DataFrame(abs(np.random.randn(2,12)), index=index1, columns=cols2)

每年的单个值可以是总和或平均值。 我尝试了数据框分组方式如下:-

DFList = []
for group in df.groupby(df.index.month):
    DFList.append(group[1])


r2 = pd.concat([DFList[0], DFList[1] ,DFList[2], DFList[3], DFList[4], 
DFList[5],DFList[6],DFList[7],DFList[8], DFList[9], 
DFList[10],DFList[11]],ignore_index=True,axis=1)
cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
r2.columns=cols2

这时我很困惑,无法继续进行下去。 感谢你在期待。请提出前进的方向。

1 个答案:

答案 0 :(得分:2)

pivot_tableDatetimeIndex.year一起使用,指定聚合函数,然后 DatetimeIndex.month

df = df.pivot_table(index=df.index.year, 
                    columns=df.index.month,
                    values='Returns', 
                    aggfunc='sum')

print (df)
             1         2          3          4          5          6   \
2011  26.049121  20.05826  29.157931  25.513904  19.148302  23.065742   
2012   0.023056       NaN        NaN        NaN        NaN        NaN   

             7          8          9          10         11        12  
2011  23.049623  20.075674  23.715332  28.650968  27.337803  24.93568  
2012        NaN        NaN        NaN        NaN        NaN       NaN

如果需要以正确的顺序命名month,请使用有序CategoricalIndexDatetimeIndex.strftime的一种解决方案:

cols2 = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = df.pivot_table(index=df.index.year, 
                    columns=pd.CategoricalIndex(df.index.strftime('%b'), 
                                                ordered=True, 
                                                categories=cols2),
                    values='Returns', 
                    aggfunc='sum')

另一种解决方案是使用DataFrame.reindex

cols2=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
df = (df.pivot_table(index=df.index.year, 
                    columns=df.index.strftime('%b'),
                    values='Returns', 
                    aggfunc='sum').reindex(columns=cols2))


print (df)
            Jan       Feb        Mar        Apr        May        Jun  \
2011  26.049121  20.05826  29.157931  25.513904  19.148302  23.065742   
2012   0.023056       NaN        NaN        NaN        NaN        NaN   

            Jul        Aug        Sep        Oct        Nov       Dec  
2011  23.049623  20.075674  23.715332  28.650968  27.337803  24.93568  
2012        NaN        NaN        NaN        NaN        NaN       NaN