我有一个熊猫DataFrame,其中包含多个系列(PERMNO)和每个日期的观察值。 我想为每个系列定义开始日期和结束日期。
首先,我尝试过:
df['startdate'] = df['date'].min()
这将返回以下输出
PERMNO date ... pricereturn startdate
0 10000 1986-01-31 ... C 1925-12-31
1 10000 1986-02-28 ... -0.257143 1925-12-31
2 10000 1986-03-31 ... 0.365385 1925-12-31
3 10000 1986-04-30 ... -0.098592 1925-12-31
4 10000 1986-05-30 ... -0.222656 1925-12-31
... ... ... ... ... ...
3599488 93436 2018-08-31 ... 0.011806 1925-12-31
3599489 93436 2018-09-28 ... -0.122290 1925-12-31
3599490 93436 2018-10-31 ... 0.274011 1925-12-31
3599491 93436 2018-11-30 ... 0.039013 1925-12-31
3599492 93436 2018-12-31 ... -0.050445 1925-12-31
当我在PERMNO上对groupby尝试相同的命令时,代码将为我的开始日期返回NaT。
df['startdate'] = df.groupby('PERMNO')['date'].min()
PERMNO date ... pricereturn startdate
0 10000 1986-01-31 ... C NaT
1 10000 1986-02-28 ... -0.257143 NaT
2 10000 1986-03-31 ... 0.365385 NaT
3 10000 1986-04-30 ... -0.098592 NaT
4 10000 1986-05-30 ... -0.222656 NaT
... ... ... ... ... ...
3599488 93436 2018-08-31 ... 0.011806 NaT
3599489 93436 2018-09-28 ... -0.122290 NaT
3599490 93436 2018-10-31 ... 0.274011 NaT
3599491 93436 2018-11-30 ... 0.039013 NaT
3599492 93436 2018-12-31 ... -0.050445 NaT
谁能告诉我哪里出了问题,或者我怎么解决呢?
我的数据示例:
{'PERMNO': {0: 10000, 1: 10000, 2: 10000, 3: 10000, 4: 10000, 5: 93436, 6: 93436, 7: 93436, 8: 93436, 9: 93436}, 'date': {0: Timestamp('1986-01-31 00:00:00'), 1: Timestamp('1986-02-28 00:00:00'), 2: Timestamp('1986-03-31 00:00:00'), 3: Timestamp('1986-04-30 00:00:00'), 4: Timestamp('1986-05-30 00:00:00'), 5: Timestamp('1986-06-30 00:00:00'), 6: Timestamp('1986-07-31 00:00:00'), 7: Timestamp('1986-08-29 00:00:00'), 8: Timestamp('1986-09-30 00:00:00'), 9: Timestamp('1986-10-31 00:00:00')}}