时间段和即时1之间的时差

时间:2019-03-05 15:22:32

标签: python pandas apply

我有一些时刻(A,B)和一些时间段(C-D)。我想找到每个时间段和与其匹配的时刻之间的时差。我的意思是:

如果时间(A或B)介于C和D之间

则dT = 0

我试图这样做:

df = pd.DataFrame({'A': [dt.datetime(2017,1,6),      dt.datetime(2017,1,4)],
                   'B': [dt.datetime(2017,1,7),      dt.datetime(2017,1,5)],
                   'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
                   'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})

# Calculate the time difference
def dT(Time, on, off):
    if Time < on:
        return on - Time
    elif Time > off:
        return Time - off
    else:
        return 0
dT = np.vectorize(dT)

df['dT_A'] = dT(df['A'], df['C'], df['D'])
df['dT_B'] = dT(df['B'], df['C'], df['D'])

# Change the time difference to a float
def floa(dT):
    if dT == 0:
        return 0
    else:
        return dT / timedelta (days=1)
floa = np.vectorize(floa)

df['dT_A'] = floa(df['dT_A'])
df['dT_B'] = floa(df['dT_B'])

它计算了dt_A,但随后却给了我这个错误:

OverflowError: Python int too large to convert to C long

2 个答案:

答案 0 :(得分:2)

尽管其名称,np.vectorize未被矢量化-它可以循环使用。因此,如果可以的话,最好对向量进行处理,幸运的是,您想要的东西在“香草”熊猫中很容易做到:

import datetime as dt

df = pd.DataFrame({'A': [dt.datetime(2017,1,6),      dt.datetime(2017,1,4)],
                   'B': [dt.datetime(2017,1,7),      dt.datetime(2017,1,5)],
                   'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
                   'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})
# default is 0
df['dT_A'] = 0
df['dT_B'] = 0

df.loc[df.A < df.C, 'dT_A'] = (df.C - df.A) .loc[df.A < df.C]
df.loc[df.A > df.D, 'dT_A'] = (df.A - df.D) .loc[df.A > df.D]

df.loc[df.B < df.C, 'dT_B'] = (df.C - df.B) .loc[df.B < df.C]
df.loc[df.B > df.D, 'dT_B'] = (df.B - df.D) .loc[df.B > df.D]

# convert timedelta to number of days, to float
df['dT_A'] = df.dT_A / dt.timedelta(days=1)
df['dT_B'] = df.dT_B / dt.timedelta(days=1)

答案 1 :(得分:0)

乔什的答案(方法A)在我的计算机上可用,但是在我的同事的计算机上不起作用。在同事的计算机上,我们需要使用另一个位置(方法B)。我的同事声称方法A试图将整个列放在每一行上。方法B对我和我的同事都有效,因此我将编辑乔什对此的回答。

方法A

import datetime as dt

df = pd.DataFrame({'A': [dt.datetime(2017,1,6),      dt.datetime(2017,1,4)],
                   'B': [dt.datetime(2017,1,7),      dt.datetime(2017,1,5)],
                   'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
                   'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})
# default is 0
df['dT_A'] = 0
df['dT_B'] = 0

df.loc[df.A < df.C, 'dT_A'] = df.C - df.A
df.loc[df.A > df.D, 'dT_A'] = df.A - df.D

df.loc[df.B < df.C, 'dT_B'] = df.C - df.B
df.loc[df.B > df.D, 'dT_B'] = df.B - df.D

# convert timedelta to number of days, to float
df['dT_A'] = df.dT_A / dt.timedelta(days=1)
df['dT_B'] = df.dT_B / dt.timedelta(days=1)

方法B

import datetime as dt

df = pd.DataFrame({'A': [dt.datetime(2017,1,6),      dt.datetime(2017,1,4)],
                   'B': [dt.datetime(2017,1,7),      dt.datetime(2017,1,5)],
                   'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
                   'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})

# default is 0
df['dT_A'] = 0
df['dT_B'] = 0

df.loc[df.A < df.C, 'dT_A'] = (df.C - df.A) .loc[df.A < df.C]
df.loc[df.A > df.D, 'dT_A'] = (df.A - df.D) .loc[df.A > df.D]

df.loc[df.B < df.C, 'dT_B'] = (df.C - df.B) .loc[df.B < df.C]
df.loc[df.B > df.D, 'dT_B'] = (df.B - df.D) .loc[df.B > df.D]

# Convert timedelta to float, number of days
df['dT_A'] = df.dT_A / np.timedelta64(1, 'D')
df['dT_B'] = df.dT_B / np.timedelta64(1, 'D')