熊猫按组计算最小时间δ

时间:2016-12-07 06:07:57

标签: pandas timedelta

我的输入数据框如下: 输入数据帧:

Input1 = pd.DataFrame({'LOT': {0: 'A1', 1: 'A2', 2: 'A3', 3: 'A4', 4: 'A5'},
 'OPERATION': {0: 100.0, 1: 100.0, 2: 100.0, 3: 100.0, 4: 100.0},
 'TXN_DATE': {0: '12/6/2016',
  1: '12/5/2016',
  2: '11/30/2016',
  3: '11/27/2016',
  4: '11/22/2016'}})

Input2 = pd.DataFrame({'LOT': {0: 'B1', 1: 'B2', 2: 'B3', 3: 'B4', 4: 'B5', 5: 'B6'},
 'OPERATION': {0: 500, 1: 500, 2: 500, 3: 500, 4: 500, 5: 500},
 'TXN_DATE': {0: '12/7/2016',
  1: '12/3/2016',
  2: '11/17/2016',
  3: '11/22/2016',
  4: '12/4/2016',
  5: '12/3/2016'}})

我很有兴趣根据它们之间的最小TXN_DATES增量来计算Input1表中从Input2到lot的伴随批次(时间增量假设为最小值):

Final DataFrame:

Expected_out =  pd.DataFrame({'COMPANION_LOT': {0: 'B5', 1: 'B5', 2: 'B4', 3: 'B4', 4: 'B4'},
 'COMPANION_LOT TXN_DATE': {0: '12/4/2016',
  1: '12/4/2016',
  2: '11/22/2016',
  3: '11/22/2016',
  4: '11/22/2016'},
 'LOT': {0: 'A1', 1: 'A2', 2: 'A3', 3: 'A4', 4: 'A5'},
 'OPERATION': {0: 100, 1: 100, 2: 100, 3: 100, 4: 100},
 'TXN_DATE': {0: '12/6/2016',
  1: '12/5/2016',
  2: '11/30/2016',
  3: '11/27/2016',
  4: '11/22/2016'}})`

谢谢

1 个答案:

答案 0 :(得分:1)

您可以主要使用pandas.merge_asof,然后按map添加新列:

Input1.TXN_DATE = pd.to_datetime(Input1.TXN_DATE)
Input2.TXN_DATE = pd.to_datetime(Input2.TXN_DATE)

Input1 = Input1.sort_values('TXN_DATE')
Input2 = Input2.sort_values('TXN_DATE')
df = pd.merge_asof(Input1, Input2, on='TXN_DATE', suffixes=('','_COMPANION')) \
       .sort_values('LOT') \
       .drop('OPERATION_COMPANION', axis=1)
df['LOT_TXN_DATE'] = df.LOT_COMPANION.map(Input2.set_index('LOT')['TXN_DATE'])
print (df)
  LOT  OPERATION   TXN_DATE LOT_COMPANION LOT_TXN_DATE
4  A1      100.0 2016-12-06            B5   2016-12-04
3  A2      100.0 2016-12-05            B5   2016-12-04
2  A3      100.0 2016-11-30            B4   2016-11-22
1  A4      100.0 2016-11-27            B4   2016-11-22
0  A5      100.0 2016-11-22            B4   2016-11-22