Question

我的df看起来像这样：

            Total
language    Julia   Python  R   SQLite
date                
2015-03-01  NaN NaN 17.0    NaN
2015-04-01  NaN 156.0   189.0   NaN
2015-05-01  13.0    212.0   202.0   NaN

该指数按月计算，我希望它是每季度：

df.resample("Q").sum()

给了我这个：

            Total
language    Julia   Python  R   SQLite
date                
2015-03-31  NaN NaN 17.0    NaN
2015-06-30  22.0    677.0   594.0   26.0
2015-09-30  37.0    1410.0  1250.0  146.0

但我想像这个Start month - End month 2017而不是结束日期那样显示索引。期望的df：

                Total
language        Julia   Python  R   SQLite
Jan - Mar, 2015 NaN NaN 17.0    NaN
Apr - Jun, 2015 22.0    677.0   594.0   26.0
Jul - Sep, 2015 37.0    1410.0  1250.0  146.0

有熊猫的方式吗？我是这样做的，但它非常脏，我确信有更好的方法（文档中的重新采样方法缺乏示例......）：

def quarterlyMonthNmaes(x): 
    start_date = x.name - pd.offsets.MonthBegin(3)
    final_date = str(start_date.strftime('%b')) + " - " + str(x.name.strftime('%b, %Y'))
    return final_date
df["Total"].apply(quarterlyMonthNmaes, axis=1)

Answer 1

使用periods：

idx = df.index.to_period('Q')
df.index = ['{0[0]}-{0[1]}'.format(x) for x in zip(idx.asfreq('M', 's').strftime('%b'), 
                                                   idx.asfreq('M', 'e').strftime('%b %Y'))]
print (df)

              Total
              language   Julia  Python      R  SQLite
Jan-Mar 2015       NaN     NaN    17.0    NaN     NaN
Apr-Jun 2015      22.0   677.0   594.0   26.0     NaN
Jul-Sep 2015      37.0  1410.0  1250.0  146.0     NaN

或更简单：

idx2 = df.index.strftime('%b %Y')
idx1 = (df.index - pd.offsets.MonthBegin(3)).strftime('%b')
df.index = ['{0[0]}-{0[1]}'.format(x) for x in zip(idx1, idx2)]

Pandas重新采样到季度，显示开始和结束月份

1 个答案: