从Pandas(numpy)的日期时间减去浮点数

时间:2014-11-26 15:03:16

标签: python numpy pandas python-datetime multidimensional-array

如何从datetime64数组中减去矢量化形式的浮点值?

数据:

import numpy as np
import pandas as pd

some_dates = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
some_ints = np.array([1 ,2 ,3], dtype = 'int64')
some_float = np.array([1.00 ,2.00 ,3.00], dtype = 'float64')

data_dict = {'dates':some_dates, 
             'ints':some_ints, 
             'floats':some_float}

test_data = pd.DataFrame(data_dict)

看起来像这样:

Out[1]: 
       dates  floats  ints
0 2007-07-13       1     1
1 2006-01-13       2     2
2 2010-08-13       3     3

我想做什么:

#===============================================================================
# Works well
#===============================================================================
test_data['dates'] = test_data['dates'].sub(test_data['ints'])

但是在矢量中使用NaN值。不支持int向量中的Nan,因此它们会自动转换为float:

#------------------------------------------------------------------------------ 
# Converts ints to floats 

test_data.dtypes

> Out[2]: 
> dates     datetime64[ns]
> floats           float64
> ints               int64
> dtype: object

test_data.loc[2:2, 'ints'] = None

> Out[3]: 
> dates     datetime64[ns]
> floats           float64
> ints             float64
> dtype: object

>  Out[4]: 
>        dates  floats  ints
> 0 2007-07-13       1     1
> 1 2006-01-13       2     2
> 2 2010-08-13       3   NaN

但是我不能从我的约会中减去花车:

#----------------------------------------------------------------------------- #
# But this way also doesn't work
test_data['dates'] = test_data['dates'].sub(test_data['floats'])

> TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64')

由于“in python”申请,我找到了极慢的解决方法:

# from dateutil.relativedelta import relativedelta
def sub_float(df_row):
    if pd.notnull(df_row['floats']):
#         df_row['dates'] = df_row['dates'] - relativedelta(days = df_row['floats'])
        df_row['dates'] = df_row['dates'] - pd.DateOffset(days=df_row['floats'])
    return(df_row['dates'])
test_data['dates'] = test_data.apply(sub_float, 1)

有什么建议我如何以矢量化的方式从日期时间中减去浮点数?

1 个答案:

答案 0 :(得分:4)

将浮点数更改为time_deltas(能够处理NaN)

In [22]: df
Out[22]:
       dates  floats  ints
0 2007-07-13     NaN     1
1 2006-01-13       2     2
2 2010-08-13       3     3

In [23]: df.dates - pd.to_timedelta(df.floats.astype(str), unit='D')
Out[23]:
0          NaT
1   2006-01-11
2   2010-08-10
dtype: datetime64[ns]