熊猫在数据框中按组的最小日期

时间:2020-04-15 10:19:40

标签: python pandas datetime pandas-groupby

我有一个熊猫DataFrame,其中包含多个系列(PERMNO)和每个日期的观察值。 我想为每个系列定义开始日期和结束日期。

首先,我尝试过:

df['startdate'] = df['date'].min()

这将返回以下输出

         PERMNO       date  ... pricereturn  startdate
0         10000 1986-01-31  ...           C 1925-12-31
1         10000 1986-02-28  ...   -0.257143 1925-12-31
2         10000 1986-03-31  ...    0.365385 1925-12-31
3         10000 1986-04-30  ...   -0.098592 1925-12-31
4         10000 1986-05-30  ...   -0.222656 1925-12-31
...         ...        ...  ...         ...        ...
3599488   93436 2018-08-31  ...    0.011806 1925-12-31
3599489   93436 2018-09-28  ...   -0.122290 1925-12-31
3599490   93436 2018-10-31  ...    0.274011 1925-12-31
3599491   93436 2018-11-30  ...    0.039013 1925-12-31
3599492   93436 2018-12-31  ...   -0.050445 1925-12-31

当我在PERMNO上对groupby尝试相同的命令时,代码将为我的开始日期返回NaT。

df['startdate'] = df.groupby('PERMNO')['date'].min()
         PERMNO       date  ... pricereturn  startdate
0         10000 1986-01-31  ...           C        NaT
1         10000 1986-02-28  ...   -0.257143        NaT
2         10000 1986-03-31  ...    0.365385        NaT
3         10000 1986-04-30  ...   -0.098592        NaT
4         10000 1986-05-30  ...   -0.222656        NaT
...         ...        ...  ...         ...        ...
3599488   93436 2018-08-31  ...    0.011806        NaT
3599489   93436 2018-09-28  ...   -0.122290        NaT
3599490   93436 2018-10-31  ...    0.274011        NaT
3599491   93436 2018-11-30  ...    0.039013        NaT
3599492   93436 2018-12-31  ...   -0.050445        NaT

谁能告诉我哪里出了问题,或者我怎么解决呢?

我的数据示例:

{'PERMNO': {0: 10000, 1: 10000, 2: 10000, 3: 10000, 4: 10000, 5: 93436, 6: 93436, 7: 93436, 8: 93436, 9: 93436}, 'date': {0: Timestamp('1986-01-31 00:00:00'), 1: Timestamp('1986-02-28 00:00:00'), 2: Timestamp('1986-03-31 00:00:00'), 3: Timestamp('1986-04-30 00:00:00'), 4: Timestamp('1986-05-30 00:00:00'), 5: Timestamp('1986-06-30 00:00:00'), 6: Timestamp('1986-07-31 00:00:00'), 7: Timestamp('1986-08-29 00:00:00'), 8: Timestamp('1986-09-30 00:00:00'), 9: Timestamp('1986-10-31 00:00:00')}}

0 个答案:

没有答案