Question

我有一个pandas DataFrame格式：

    id     amount           birth
0   4      78.0      1980-02-02 00:00:00
1   5      24.0      1989-03-03 00:00:00
2   6      49.5      2014-01-01 00:00:00
3   7      34.0      2014-01-01 00:00:00
4   8      49.5      2014-01-01 00:00:00

我仅对birth的{{1}}列中的年，月和日感兴趣。我尝试利用dataframe中的Python datetime，但是导致出现错误：

pandas

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1054-02-07 00:00:00列是birth dtype。

我的猜测是日期不正确。我不不希望将参数object传递到errors="coerce"方法中，因为每个项目都很重要，我只需要to_datetime。

我尝试利用YYYY-MM-DD中的regex：

pandas

但这返回df["birth"].str.find("(\d{4})-(\d{2})-(\d{2})")。我该如何解决？

谢谢

Answer 1

由于无法转换为日期时间，因此可以在第一个空白处使用split，然后选择第一个值：

df['birth'] = df['birth'].str.split().str[0]

然后如有必要，转换为句点。

Representing out-of-bounds spans。

print (df)
   id  amount                birth
0   4    78.0  1980-02-02 00:00:00
1   5    24.0  1989-03-03 00:00:00
2   6    49.5  2014-01-01 00:00:00
3   7    34.0  2014-01-01 00:00:00
4   8    49.5     0-01-01 00:00:00

def to_per(x):
    splitted = x.split('-')
    return pd.Period(year=int(splitted[0]), 
                     month=int(splitted[1]), 
                     day=int(splitted[2]), freq='D')

df['birth'] = df['birth'].str.split().str[0].apply(to_per)

print (df)
   id  amount       birth
0   4    78.0  1980-02-02
1   5    24.0  1989-03-03
2   6    49.5  2014-01-01
3   7    34.0  2014-01-01
4   8    49.5  0000-01-01

在熊猫数据框中匹配日期时间YYYY-MM-DD对象

1 个答案: