如何在Python Pandas中将groupby DF更改为月度日期时间序列

时间:2013-07-23 20:09:41

标签: python pandas

monthly_dividend
1994  10   NaN
      11   NaN
      12   NaN
      12   NaN
...
2012  4          NaN
      5          NaN
      6          NaN
      7     1.746622
      8     1.607685
      9     1.613936
      10    1.620187
      11    1.626125
      12    1.632375
2013  1     1.667792
      2     1.702897
      3     1.738314
      4     1.773731
      5     1.808835
      6     1.844252
Length: 225

我的代码与上面的内容类似。这是一个按DataFrame分组的,但是我想再次将它变成一个常规的TimeSeries。 asfreq('M')不再适用于分组,因此我不确定是否有一种简单的方法可以转换它。

dividends
1994-10-31    0.0750
1994-11-30    0.0750
1994-12-31    0.0750
1995-12-31    0.3450
...
2012-03-31    0.145812
2012-04-30    0.145812
2012-05-31    0.145812
2012-06-30    0.146125
2012-07-31    0.146125
2012-08-31    0.151125
2012-09-30    0.151438
2012-10-31    0.151438
2012-11-30    0.151438
2012-12-31    0.151750
2013-01-31    0.180917
2013-02-28    0.180917
2013-03-31    0.181229
2013-04-30    0.181229
2013-05-31    0.181229
Freq: M, Length: 224

1 个答案:

答案 0 :(得分:1)

创建您的热门数据

In [172]: df = DataFrame(randn(200,1),columns=['A'],index=pd.date_range('2000',periods=200,freq='M'))

In [173]: df['month'] = df.index.month

In [174]: df['year'] = df.index.year

In [175]: df = df.reset_index(drop=True).set_index(['year','month'])

In [176]: df
Out[176]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 200 entries, (2000, 7) to (2017, 2)
Data columns (total 1 columns):
A    200  non-null values
dtypes: float64(1)

In [177]: df.head()
Out[177]: 
                   A
year month          
2000 7      0.084256
     8      2.507213
     9     -0.642151
     10     1.972307
     11     0.926586

这将创建每月频率的PeriodIndex。请注意,迭代索引会产生元组(作为整数)

In [179]: pd.PeriodIndex([ pd.Period(year=year,month=month,freq='M') for year, month in df.index ])
Out[179]: 
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2000-07, ..., 2017-02]
length: 200

直接转换为DateTimeIndex

In [180]: new_index = pd.PeriodIndex([ pd.Period(year=year,month=month,freq='M') for year, month in df.index ]).to_timestamp()
Out[180]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-07-01 00:00:00, ..., 2017-02-01 00:00:00]
Length: 200, Freq: MS, Timezone: None

此时你可以做到

In [182]: df.index = new_index

In [183]: df
Out[183]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 200 entries, 2000-07-01 00:00:00 to 2017-02-01 00:00:00
Freq: MS
Data columns (total 1 columns):
A    200  non-null values
dtypes: float64(1)

In [184]: df.head()
Out[184]: 
                   A
2000-07-01  0.084256
2000-08-01  2.507213
2000-09-01 -0.642151
2000-10-01  1.972307
2000-11-01  0.926586

to_timestamp通常会返回该月的第一天 返回结束,通过how='e'

In [1]: pr = pd.period_range('200001',periods=20,freq='M')

In [2]: pr
Out[2]: 
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2000-01, ..., 2001-08]
length: 20

In [3]: pr.to_timestamp()
Out[3]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-01 00:00:00, ..., 2001-08-01 00:00:00]
Length: 20, Freq: MS, Timezone: None

In [4]: pr.to_timestamp(how='e')
Out[4]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-01-31 00:00:00, ..., 2001-08-31 00:00:00]
Length: 20, Freq: M, Timezone: None