我正在尝试使用2个ID在两个日期之间合并两个数据框,然后将输出保存在新的数据框中。
考虑以下示例:
# first_df
FK date value1 value2 ... (more columns)
1 2019-01-01 50 50
1 2019-01-02 40 80
1 2019-01-03 80 20
1 2019-01-04 18 44
1 2019-01-05 120 50
1 2019-01-06 80 0
1 2019-01-10 60 65
1 2019-01-15 25 44
1 2019-01-25 20 20
2 2019-01-01 50 40
2 2019-01-02 80 45
2 2019-01-03 85 90
2 2019-01-08 100 10
2 2019-01-10 55 20
2 2019-01-15 80 150
...............................
# second_df
FK entityId date percentage
1 1 2019-01-01 50
1 1 2019-01-05 80
1 2 2019-01-10 40
1 2 2019-01-15 60
1 2 2019-01-20 90
2 1 2019-01-01 48
2 2 2019-01-02 40
2 2 2019-01-08 50
2 2 2019-01-20 20
......................
# output_df
FK entityId date value1 value2
1 1 2019-01-01 50% of 50 = 25 50% of 50 = 25
1 1 2019-01-02 50% of 40 = 20 50% of 80 = 40
1 1 2019-01-03 50% of 80 = 40 50% of 20 = 10
1 1 2019-01-04 50% of 18 = 9 50% of 44 = 22
1 1 2019-01-05 80% of 120 = 96 80% of 50 = 40
1 1 2019-01-06 80% of 80 = 64 80% of 0 = 0
1 1 2019-01-10 80% of 60 = 24 40% of 65 = 26
1 1 2019-01-15 80% of 25 = 15 60% of 44 = 26.4
1 1 2019-01-25 80% of 20 = 18 90% of 20 = 18
1 2 2019-01-10 40% of 60 = 24 40% of 65 = 26 # Because entityId is different, restart to iterate on first_df from 2019-01-10 using the FK: 1
1 2 2019-01-15 60% of 25 = 15 60% of 44 = 26.4
1 2 2019-01-25 90% of 20 = 18 90% of 20 = 18
2 1 2019-01-01 48% of 50 = 24 48% of 40 = 19.2 # Use FK: 2
2 1 2019-01-02 48% of 80 = 38.4 48% of 45 = 21.6
2 1 2019-01-03 48% of 85 = 40.8 48% of 90 = 43.2
2 2 2019-01-02 40% of 80 = 32 40% of 45 = 18 # Because entityId is different, restart to iterate on first_df from 2019-01-02 using the FK: 2
2 2 2019-01-03 40% of 85 = 34 40% of 90 = 36
2 2 2019-01-08 50% of 100 = 50 50% of 10 = 5
2 2 2019-01-10 50% of 55 = 27.5 50% of 20 = 10
2 2 2019-01-15 50% of 80 = 40 50% if 150 = 75
我(FK,entityId)是我的一对夫妇。 “ FK”用于了解所有值以应用百分比,“ entityId”用于检查 first_df 和 second_df
中的日期该百分比应用于具有相同FK的所有记录,其中我的日期为:
second_df.date <= first_df.date
例如,对于我的FK = 1和entityId = 1,在2019-01-01和2019-01-04之间,我应用百分比50(来自second_df)
当前我正在使用:
output_df = pd.merge_asof(first_df.sort_values('date'), second_df.sort_values('date'), by='FK',
on='date').sort_values('FK')
output_df[['value1', 'value2']].mul(output_df.percentage / 100, 0)
(如您所见,其中不包含“ entityId”),以便合并我的第一和第二个df。
我的问题是我不知道如何更改它以包括“ entityId”列,并且每次(FK,entityId)是一对新夫妇时都重复记录。你会怎么做?
编辑1
按照要求查找以下代码以生成dfs:
first_data = [
[1, datetime.datetime(2019, 1, 1), 50, 50],
[1, datetime.datetime(2019, 1, 2), 40, 80],
[1, datetime.datetime(2019, 1, 3), 80, 20],
[1, datetime.datetime(2019, 1, 4), 18, 44],
[1, datetime.datetime(2019, 1, 5), 120, 50],
[1, datetime.datetime(2019, 1, 6), 80, 0],
[1, datetime.datetime(2019, 1, 10), 60, 65],
[1, datetime.datetime(2019, 1, 15), 25, 44],
[1, datetime.datetime(2019, 1, 25), 20, 20],
[2, datetime.datetime(2019, 1, 1), 50, 40],
[2, datetime.datetime(2019, 1, 2), 80, 45],
[2, datetime.datetime(2019, 1, 3), 85, 90],
[2, datetime.datetime(2019, 1, 8), 100, 10],
[2, datetime.datetime(2019, 1, 10), 55, 20],
[2, datetime.datetime(2019, 1, 15), 80, 150],
]
first_df = pd.DataFrame(first_data, columns=['FK', 'date', 'value1', 'value2'])
second_data = [
[1, 1, datetime.datetime(2019, 1, 1), 50],
[1, 1, datetime.datetime(2019, 1, 5), 80],
[1, 2, datetime.datetime(2019, 1, 10), 40],
[1, 2, datetime.datetime(2019, 1, 15), 60],
[1, 2, datetime.datetime(2019, 1, 20), 90],
[1, 1, datetime.datetime(2019, 1, 1), 48],
[1, 2, datetime.datetime(2019, 1, 2), 40],
[1, 2, datetime.datetime(2019, 1, 8), 50],
[1, 2, datetime.datetime(2019, 1, 20), 20],
]
second_df = pd.DataFrame(second_data, columns=['FK', 'entityId', 'date', 'percentage'])