熊猫 - 通过时间接近加入

时间:2015-09-24 18:22:41

标签: python join pandas timestamp dataframe

我有2个数据框left_dfright_df,每个数据框都有一个与datetime相对应的列。我希望以这样的方式加入它们:对于R中的每一行left_df,我发现right_df中的行与R中的right_df最接近left_df中的行,并将它们放在一起。我不知道right_dfleft_df = left_dt left_flag 0 2014-08-23 07:57:03.827516 True 1 2014-08-23 09:27:12.831126 False 2 2014-08-23 11:55:27.551029 True 3 2014-08-23 16:11:33.511049 True right_df = right dt right_flag 0 2014-08-23 07:12:52.80587 True 1 2014-08-23 15:12:34.815087 True desired output_df = left_dt left_flag right dt right_flag 0 2014-08-23 07:57:03.827516 True 2015-08-23 07:12:52.80587 True 1 2014-08-23 09:27:12.831126 False 2015-08-23 07:12:52.80587 True 2 2014-08-23 11:55:27.551029 True 2015-08-23 15:12:34.815087 True 3 2014-08-23 16:11:33.511049 True 2015-08-23 15:12:34.815087 True 中的行是否排在第一位。

下面给出一个例子:

$( 'div' ).on( 'paste', function( aEvent ) {
    var evt = aEvent.originalEvent;
    var text = evt.clipboardData.getData( 'text/plain' );
    var html = evt.clipboardData.getData( 'text/html' );
    var i, len;

    console.log( 'text=' + text );
    console.log( 'html=' + html );

    console.log( 'data types=' );
    console.log( evt.clipboardData.types );

    for ( i = 0, len = evt.clipboardData.types.length; i < len; i++ ) {
        console.log( evt.clipboardData.types[ i ] + '=' + evt.clipboardData.getData( evt.clipboardData.types[ i ] ) );
    }
});

1 个答案:

答案 0 :(得分:0)

我不确定它是否适用于所有情况。但我认为这可能是一个解决方案。

# Test data
left_df = pd.DataFrame({'left_dt': ['2014-08-23 07:57:03.827516',
  '2014-08-23 09:27:12.831126',
  '2014-08-23 11:55:27.551029',
  '2014-08-23 16:11:33.511049'],
 'left_flag': [True, False, True, True]})
left_df['left_dt'] = pd.to_datetime(left_df['left_dt'])


right_df = pd.DataFrame(
{'right_dt': ['2014-08-23 07:12:52.80587', '2014-08-23 15:12:34.815087'],
 'right_flag': [True, True]})
right_df['right_dt'] = pd.to_datetime(right_df['right_dt'])


# Setting the date as the index for each DataFrame
left_df.set_index('left_dt', drop=False, inplace=True)
right_df.set_index('right_dt', drop=False, inplace=True)

# Merging them and filling the gaps
output_df = left_df.join(right_df, how='outer').sort_index()
output_df.fillna(method='ffill', inplace=True)
# Droping unwanted values from the left
output_df.dropna(subset=['left_dt'], inplace=True)
# Computing a difference to select the right duplicated row to drop (the one with the greates diff)
output_df['diff'] = abs(output_df['left_dt'] - output_df['right_dt'])
output_df.sort(columns='diff', inplace=True)
output_df.drop_duplicates(subset=['left_dt'], inplace=True)
# Bringing back the index
output_df.sort_index(inplace=True)
output_df = output_df.reset_index(drop=True)
# Droping unwanted column
output_df.drop('diff', axis=1, inplace=True)
output_df

                     left_dt left_flag                   right_dt right_flag
0 2014-08-23 07:57:03.827516      True 2014-08-23 07:12:52.805870       True
1 2014-08-23 09:27:12.831126     False 2014-08-23 07:12:52.805870       True
2 2014-08-23 11:55:27.551029      True 2014-08-23 15:12:34.815087       True
3 2014-08-23 16:11:33.511049      True 2014-08-23 15:12:34.815087       True