我有一些时刻(A,B)和一些时间段(C-D)。我想找到每个时间段和与其匹配的时刻之间的时差。我的意思是:
如果时间(A或B)介于C和D之间
则dT = 0
我试图这样做:
df = pd.DataFrame({'A': [dt.datetime(2017,1,6), dt.datetime(2017,1,4)],
'B': [dt.datetime(2017,1,7), dt.datetime(2017,1,5)],
'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})
# Calculate the time difference
def dT(Time, on, off):
if Time < on:
return on - Time
elif Time > off:
return Time - off
else:
return 0
dT = np.vectorize(dT)
df['dT_A'] = dT(df['A'], df['C'], df['D'])
df['dT_B'] = dT(df['B'], df['C'], df['D'])
# Change the time difference to a float
def floa(dT):
if dT == 0:
return 0
else:
return dT / timedelta (days=1)
floa = np.vectorize(floa)
df['dT_A'] = floa(df['dT_A'])
df['dT_B'] = floa(df['dT_B'])
它计算了dt_A
,但随后却给了我这个错误:
OverflowError: Python int too large to convert to C long
答案 0 :(得分:2)
尽管其名称,np.vectorize
未被矢量化-它可以循环使用。因此,如果可以的话,最好对向量进行处理,幸运的是,您想要的东西在“香草”熊猫中很容易做到:
import datetime as dt
df = pd.DataFrame({'A': [dt.datetime(2017,1,6), dt.datetime(2017,1,4)],
'B': [dt.datetime(2017,1,7), dt.datetime(2017,1,5)],
'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})
# default is 0
df['dT_A'] = 0
df['dT_B'] = 0
df.loc[df.A < df.C, 'dT_A'] = (df.C - df.A) .loc[df.A < df.C]
df.loc[df.A > df.D, 'dT_A'] = (df.A - df.D) .loc[df.A > df.D]
df.loc[df.B < df.C, 'dT_B'] = (df.C - df.B) .loc[df.B < df.C]
df.loc[df.B > df.D, 'dT_B'] = (df.B - df.D) .loc[df.B > df.D]
# convert timedelta to number of days, to float
df['dT_A'] = df.dT_A / dt.timedelta(days=1)
df['dT_B'] = df.dT_B / dt.timedelta(days=1)
答案 1 :(得分:0)
乔什的答案(方法A)在我的计算机上可用,但是在我的同事的计算机上不起作用。在同事的计算机上,我们需要使用另一个位置(方法B)。我的同事声称方法A试图将整个列放在每一行上。方法B对我和我的同事都有效,因此我将编辑乔什对此的回答。
方法A
import datetime as dt
df = pd.DataFrame({'A': [dt.datetime(2017,1,6), dt.datetime(2017,1,4)],
'B': [dt.datetime(2017,1,7), dt.datetime(2017,1,5)],
'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})
# default is 0
df['dT_A'] = 0
df['dT_B'] = 0
df.loc[df.A < df.C, 'dT_A'] = df.C - df.A
df.loc[df.A > df.D, 'dT_A'] = df.A - df.D
df.loc[df.B < df.C, 'dT_B'] = df.C - df.B
df.loc[df.B > df.D, 'dT_B'] = df.B - df.D
# convert timedelta to number of days, to float
df['dT_A'] = df.dT_A / dt.timedelta(days=1)
df['dT_B'] = df.dT_B / dt.timedelta(days=1)
方法B
import datetime as dt
df = pd.DataFrame({'A': [dt.datetime(2017,1,6), dt.datetime(2017,1,4)],
'B': [dt.datetime(2017,1,7), dt.datetime(2017,1,5)],
'C': [dt.datetime(2017,1,6,12,3), dt.datetime(2017,1,6,13,3)],
'D': [dt.datetime(2017,1,8,12,3), dt.datetime(2017,1,8,14,3)]})
# default is 0
df['dT_A'] = 0
df['dT_B'] = 0
df.loc[df.A < df.C, 'dT_A'] = (df.C - df.A) .loc[df.A < df.C]
df.loc[df.A > df.D, 'dT_A'] = (df.A - df.D) .loc[df.A > df.D]
df.loc[df.B < df.C, 'dT_B'] = (df.C - df.B) .loc[df.B < df.C]
df.loc[df.B > df.D, 'dT_B'] = (df.B - df.D) .loc[df.B > df.D]
# Convert timedelta to float, number of days
df['dT_A'] = df.dT_A / np.timedelta64(1, 'D')
df['dT_B'] = df.dT_B / np.timedelta64(1, 'D')