在日期之间使用2个ID合并两个数据框

时间:2019-12-16 12:42:55

标签: python pandas dataframe pandas-groupby

我正在尝试使用2个ID在两个日期之间合并两个数据框,然后将输出保存在新的数据框中。

考虑以下示例:

# first_df 
FK    date          value1   value2 ... (more columns)
1     2019-01-01    50       50
1     2019-01-02    40       80
1     2019-01-03    80       20
1     2019-01-04    18       44
1     2019-01-05    120      50
1     2019-01-06    80       0
1     2019-01-10    60       65
1     2019-01-15    25       44
1     2019-01-25    20       20
2     2019-01-01    50       40
2     2019-01-02    80       45
2     2019-01-03    85       90
2     2019-01-08    100      10
2     2019-01-10    55       20
2     2019-01-15    80       150
...............................


# second_df
FK   entityId    date          percentage
1    1           2019-01-01    50
1    1           2019-01-05    80
1    2           2019-01-10    40
1    2           2019-01-15    60
1    2           2019-01-20    90
2    1           2019-01-01    48
2    2           2019-01-02    40
2    2           2019-01-08    50
2    2           2019-01-20    20
......................


# output_df
FK    entityId    date          value1            value2
1     1           2019-01-01    50% of 50 = 25    50% of 50 = 25
1     1           2019-01-02    50% of 40 = 20    50% of 80 = 40
1     1           2019-01-03    50% of 80 = 40    50% of 20 = 10
1     1           2019-01-04    50% of 18 = 9     50% of 44 = 22
1     1           2019-01-05    80% of 120 = 96   80% of 50 = 40
1     1           2019-01-06    80% of 80 = 64    80% of 0 = 0
1     1           2019-01-10    80% of 60 = 24    40% of 65 = 26
1     1           2019-01-15    80% of 25 = 15    60% of 44 = 26.4
1     1           2019-01-25    80% of 20 = 18    90% of 20 = 18
1     2           2019-01-10    40% of 60 = 24    40% of 65 = 26     # Because entityId is different, restart to iterate on first_df from 2019-01-10 using the FK: 1
1     2           2019-01-15    60% of 25 = 15    60% of 44 = 26.4  
1     2           2019-01-25    90% of 20 = 18    90% of 20 = 18
2     1           2019-01-01    48% of 50 = 24    48% of 40 = 19.2   # Use FK: 2
2     1           2019-01-02    48% of 80 = 38.4  48% of 45 = 21.6
2     1           2019-01-03    48% of 85 = 40.8  48% of 90 = 43.2
2     2           2019-01-02    40% of 80 = 32    40% of 45 = 18     # Because entityId is different, restart to iterate on first_df from 2019-01-02 using the FK: 2
2     2           2019-01-03    40% of 85 = 34    40% of 90 = 36
2     2           2019-01-08    50% of 100 = 50   50% of 10 = 5
2     2           2019-01-10    50% of 55 = 27.5  50% of 20 = 10
2     2           2019-01-15    50% of 80 = 40    50% if 150 = 75   

我(FK,entityId)是我的一对夫妇。 “ FK”用于了解所有值以应用百分比,“ entityId”用于检查 first_df second_df

中的日期

该百分比应用于具有相同FK的所有记录,其中我的日期为: second_df.date <= first_df.date

例如,对于我的FK = 1和entityId = 1,在2019-01-01和2019-01-04之间,我应用百分比50(来自second_df)

当前我正在使用:

output_df = pd.merge_asof(first_df.sort_values('date'), second_df.sort_values('date'), by='FK',
                          on='date').sort_values('FK')
output_df[['value1', 'value2']].mul(output_df.percentage / 100, 0)

(如您所见,其中不包含“ entityId”),以便合并我的第一和第二个df。

我的问题是我不知道如何更改它以包括“ entityId”列,并且每次(FK,entityId)是一对新夫妇时都重复记录。你会怎么做?

编辑1

按照要求查找以下代码以生成dfs:

        first_data = [
        [1, datetime.datetime(2019, 1, 1), 50, 50],
        [1, datetime.datetime(2019, 1, 2), 40, 80],
        [1, datetime.datetime(2019, 1, 3), 80, 20],
        [1, datetime.datetime(2019, 1, 4), 18, 44],
        [1, datetime.datetime(2019, 1, 5), 120, 50],
        [1, datetime.datetime(2019, 1, 6), 80, 0],
        [1, datetime.datetime(2019, 1, 10), 60, 65],
        [1, datetime.datetime(2019, 1, 15), 25, 44],
        [1, datetime.datetime(2019, 1, 25), 20, 20],
        [2, datetime.datetime(2019, 1, 1), 50, 40],
        [2, datetime.datetime(2019, 1, 2), 80, 45],
        [2, datetime.datetime(2019, 1, 3), 85, 90],
        [2, datetime.datetime(2019, 1, 8), 100, 10],
        [2, datetime.datetime(2019, 1, 10), 55, 20],
        [2, datetime.datetime(2019, 1, 15), 80, 150],
    ]
    first_df = pd.DataFrame(first_data, columns=['FK', 'date', 'value1', 'value2'])

    second_data = [
        [1, 1, datetime.datetime(2019, 1, 1), 50],
        [1, 1, datetime.datetime(2019, 1, 5), 80],
        [1, 2, datetime.datetime(2019, 1, 10), 40],
        [1, 2, datetime.datetime(2019, 1, 15), 60],
        [1, 2, datetime.datetime(2019, 1, 20), 90],
        [1, 1, datetime.datetime(2019, 1, 1), 48],
        [1, 2, datetime.datetime(2019, 1, 2), 40],
        [1, 2, datetime.datetime(2019, 1, 8), 50],
        [1, 2, datetime.datetime(2019, 1, 20), 20],
    ]
    second_df = pd.DataFrame(second_data, columns=['FK', 'entityId', 'date', 'percentage'])

0 个答案:

没有答案