处理永久到期债券,到期日为31-12-9999凌晨12:00:00

时间:2018-03-22 06:02:40

标签: pandas date dataframe string-to-datetime

  

我在数据框中有多个记录,其中有到期日   栏目是31-12-9999 12:00:00 AM,因为债券从未成熟。这个   自然会引发错误:

Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
  

我看到最长日期是:

pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
  

我只是想澄清一下清理数据框中所有日期列的最佳方法并修复我的错误?我的代码模仿了文档:

df_Fix_Date = df_Date['maturity_date'].head(8)
display(df_Fix_Date)
display(df_Fix_Date.dtypes)

0    2020-08-15 00:00:00.000
1    2022-11-06 00:00:00.000
2    2019-03-15 00:00:00.000
3    2025-01-15 00:00:00.000
4    2035-05-29 00:00:00.000
5    2027-06-01 00:00:00.000
6    2021-04-01 00:00:00.000
7    2022-04-03 00:00:00.000
Name: maturity_date, dtype: object

def conv(x):
        return pd.Period(day = x%100, month = x//100 % 100, year = x // 10000, freq='D')

df_Fix_Date['maturity_date'] = pd.to_datetime(df_Fix_Date['maturity_date'])               # convert to datetype
df_Fix_Date['maturity_date'] = pd.PeriodIndex(df_Fix_Date['maturity_date'].apply(conv))   # fix error
display(df_Fix_Date)
  

输出:

KeyError: 'maturity_date'

1 个答案:

答案 0 :(得分:1)

存在无法转换为越界日期的问题。

一种解决方案是将9999替换为2261

df_Fix_Date['maturity_date'] = df_Fix_Date['maturity_date'].replace('^9999','2261',regex=True)
df_Fix_Date['maturity_date'] = pd.to_datetime(df_Fix_Date['maturity_date']) 
print (df_Fix_Date)
  maturity_date
0    2020-08-15
1    2022-11-06
2    2019-03-15
3    2025-01-15
4    2035-05-29
5    2027-06-01
6    2021-04-01
7    2261-04-03

另一种解决方案是将年份更高的所有日期替换为22612261

m = df_Fix_Date['maturity_date'].str[:4].astype(int) > 2261
df_Fix_Date['maturity_date'] = df_Fix_Date['maturity_date'].mask(m, '2261' + df_Fix_Date['maturity_date'].str[4:])
df_Fix_Date['maturity_date'] = pd.to_datetime(df_Fix_Date['maturity_date']) 
print (df_Fix_Date)
  maturity_date
0    2020-08-15
1    2022-11-06
2    2019-03-15
3    2025-01-15
4    2035-05-29
5    2027-06-01
6    2021-04-01
7    2261-04-03

或者通过参数NaT将问题日期替换为errors='coerce'

df_Fix_Date['maturity_date'] = pd.to_datetime(df_Fix_Date['maturity_date'], errors='coerce') 
print (df_Fix_Date)
  maturity_date
0    2020-08-15
1    2022-11-06
2    2019-03-15
3    2025-01-15
4    2035-05-29
5    2027-06-01
6    2021-04-01
7           NaT