我的df看起来像这样:
Total
language Julia Python R SQLite
date
2015-03-01 NaN NaN 17.0 NaN
2015-04-01 NaN 156.0 189.0 NaN
2015-05-01 13.0 212.0 202.0 NaN
该指数按月计算,我希望它是每季度:
df.resample("Q").sum()
给了我这个:
Total
language Julia Python R SQLite
date
2015-03-31 NaN NaN 17.0 NaN
2015-06-30 22.0 677.0 594.0 26.0
2015-09-30 37.0 1410.0 1250.0 146.0
但我想像这个Start month - End month 2017
而不是结束日期那样显示索引。期望的df:
Total
language Julia Python R SQLite
Jan - Mar, 2015 NaN NaN 17.0 NaN
Apr - Jun, 2015 22.0 677.0 594.0 26.0
Jul - Sep, 2015 37.0 1410.0 1250.0 146.0
有熊猫的方式吗?我是这样做的,但它非常脏,我确信有更好的方法(文档中的重新采样方法缺乏示例......):
def quarterlyMonthNmaes(x):
start_date = x.name - pd.offsets.MonthBegin(3)
final_date = str(start_date.strftime('%b')) + " - " + str(x.name.strftime('%b, %Y'))
return final_date
df["Total"].apply(quarterlyMonthNmaes, axis=1)
答案 0 :(得分:1)
使用periods:
idx = df.index.to_period('Q')
df.index = ['{0[0]}-{0[1]}'.format(x) for x in zip(idx.asfreq('M', 's').strftime('%b'),
idx.asfreq('M', 'e').strftime('%b %Y'))]
print (df)
Total
language Julia Python R SQLite
Jan-Mar 2015 NaN NaN 17.0 NaN NaN
Apr-Jun 2015 22.0 677.0 594.0 26.0 NaN
Jul-Sep 2015 37.0 1410.0 1250.0 146.0 NaN
或更简单:
idx2 = df.index.strftime('%b %Y')
idx1 = (df.index - pd.offsets.MonthBegin(3)).strftime('%b')
df.index = ['{0[0]}-{0[1]}'.format(x) for x in zip(idx1, idx2)]