Python-取与列中第一个日期的时差

时间:2018-10-26 03:19:20

标签: python pandas datetime dataframe

鉴于date列,我想创建另一个列diff,该列比较从第一个日期算起有多少天。

date                    diff
2011-01-01 00:00:10      0
2011-01-01 00:00:11      0.000011 days
2011-02-01 00:00:11      30.000011 days 
2013-02-01 00:00:11      395.000011 days
2014-02-01 00:00:11      760.000011 days

日期为日期时间。到目前为止,我尝试过的事情:

df = df.sort_values(['date'], ascending=True)
df.set_index('date', inplace = True)
first = df.index[0]
df['diff'] = (first - df.index.shift()).fillna(0)

4 个答案:

答案 0 :(得分:1)

您可以尝试

df['diff'] = df.date - df.date.min()

df
                 date               diff
0 2011-01-01 00:00:10    0 days 00:00:00
1 2011-01-01 00:00:11    0 days 00:00:01
2 2011-02-01 00:00:11   31 days 00:00:01
3 2013-02-01 00:00:11  762 days 00:00:01
4 2014-02-01 00:00:11 1127 days 00:00:01

答案 1 :(得分:0)

您可以使用这种方法而无需设置新索引

原始数据框

df
                 date        diff
0 2011-01-01 00:00:10    0.000000
1 2011-01-01 00:00:11    0.000011
2 2011-02-01 00:00:11   30.000011
3 2013-02-01 00:00:11  395.000011
4 2014-02-01 00:00:11  760.000011

可能的答案

df['diff_new'] = df['date'] - df.loc[0,'date']

                 date        diff           diff_new
0 2011-01-01 00:00:10    0.000000    0 days 00:00:00
1 2011-01-01 00:00:11    0.000011    0 days 00:00:01
2 2011-02-01 00:00:11   30.000011   31 days 00:00:01
3 2013-02-01 00:00:11  395.000011  762 days 00:00:01
4 2014-02-01 00:00:11  760.000011 1127 days 00:00:01

顺便说一句,我在第三行的原始数据中看到了不同的日期差。您可以与this online tool to calculate date differences in days进行手动比较。

答案 2 :(得分:0)

这是您尝试的。

>>> df
                  date
0  2011-01-01 00:00:10
1  2011-01-01 00:00:11
2  2011-02-01 00:00:11
3  2013-02-01 00:00:11
4  2014-02-01 00:00:11

首先将它们转换为时间戳,以便可以正确构造数据。转换后,只需将DataFrame进行差分即可

>>> df2 = df.apply(lambda x: [pd.Timestamp(ts) for ts in x])
>>> df['diff']  = (df2 - df2.shift()).fillna(0)
>>> df
                  date              diff
0  2011-01-01 00:00:10   0 days 00:00:00
1  2011-01-01 00:00:11   0 days 00:00:01
2  2011-02-01 00:00:11  31 days 00:00:00
3  2013-02-01 00:00:11 731 days 00:00:00
4  2014-02-01 00:00:11 365 days 00:00:00

答案 3 :(得分:0)

这就是我要获取天作为浮点数值的方法:

dates = pd.to_datetime(df.date) # make sure we are working with dates and not strings
df["diff"] = (dates - dates[0]).apply(lambda x: x.total_seconds() / 86400))

产生的df

                  date         diff
0  2011-01-01 00:00:10     0.000000
1  2011-01-01 00:00:11     0.000012
2  2011-02-01 00:00:11    31.000012
3  2013-02-01 00:00:11   762.000012
4  2014-02-01 00:00:11  1127.000012