我有一个pandas数据帧“df_OUT”,如下所示。我正在使用python 2.7 -
__int128
数据框中的值如下所示 -
>>> df_OUT.dtypes
TRX_DATE datetime64[ns]
ACTUAL_DATE_CLOSED object
现在我想找到“TRX_DATE”和&带有&的数字中的“ACTUAL_DATE_CLOSED”没有日子。
我尝试了下面的内容 -
>>> df_OUT.head(5)
TRX_DATE ACTUAL_DATE_CLOSED
0 1995-09-08 4712-12-31 00:00:00
2 2003-06-30 4712-12-31 00:00:00
3 2003-06-30 4712-12-31 00:00:00
4 2003-06-30 4712-12-31 00:00:00
6 1999-08-31 2099-08-31 00:00:00
这给了我错误 -
df_FINAL_RESULTS['TRX_DATE']-df_FINAL_RESULTS['ACTUAL_DATE_CLOSED'].map(lambda x: x.strftime('%Y-%m-%d'))
你能指导我吗?
感谢。
答案 0 :(得分:1)
你的问题是pandas Timestamp的最大日期是2261年。我们需要使用python datetime.date构造。
# this is not nice data - well past pandas.Timestamp.max
# let's get it as strings into a pandas DataFrame
data = """index, TRX_DATE, ACTUAL_DATE_CLOSED
0, 1995-09-08, 4712-12-31 00:00:00
2, 2003-06-30, 4712-12-31 00:00:00
3, 2003-06-30, 4712-12-31 00:00:00
4, 2003-06-30, 4712-12-31 00:00:00
6, 1999-08-31, 2099-08-31 00:00:00
"""
from StringIO import StringIO # import from io for Python 3
df = pd.read_csv(StringIO(data), header=0, sep=',', index_col=0,
skipinitialspace=True, dtype={'ACTUAL_DATE_CLOSED': object})
# convert to python datetime.date - will do in new columns
import datetime as dt
df['closed'] = [dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S', ).date()
for x in df['ACTUAL_DATE_CLOSED']]
df['transaction'] = [dt.datetime.strptime(x, '%Y-%m-%d', ).date()
for x in df['TRX_DATE']]
# find the difference between the two dates
df['difference'] = df['closed'] - df['transaction']