Question

我有一个这样的数据框：

df1:

  start_date  end_date
0 20180101    20181231
1 20170101    20171231

另一个这样的数据框：

df2:

   Type    Value
0  House   100
1  Car     200
2  Bus     300
3  House   150 
4  Car     220  
5  Bus     320

我需要以将df1的第一个值（start_date 20180101和end_date 20181231）应用于df2的第一轮，并将第二个应用于第二轮的方式进行合并，以此类推（以“ House”第一次出现应该具有开始日期20180101和结束日期20181231；第二次出现“房屋”时，应该具有开始日期20170101和结束日期20171231，依此类推。它应该看起来像这样：

df3：

   Type    Value  start_date  end_date
1  House   100    20180101    20181231
2  Car     200    20180101    20181231
3  Bus     300    20180101    20181231
4  House   150    20170101    20171231
5  Car     220    20170101    20171231
6  Bus     320    20170101    20171231

有什么想法吗？

Answer 1

首先，我们在rounds中创建df2列，该列指示House再次出现时的单独回合。

然后，我们还在rounds中为每一行创建一个df1列。

最后，我们在merge列上rounds：

df2['rounds'] = df2['Type'].eq('House').cumsum()
df1['rounds'] = df1.index + 1

df2 = df2.merge(df1, on='rounds', how='left').drop(columns='rounds')

输出

    Type  Value  start_date  end_date
0  House    100    20180101  20181231
1    Car    200    20180101  20181231
2    Bus    300    20180101  20181231
3  House    150    20170101  20171231
4    Car    220    20170101  20171231
5    Bus    320    20170101  20171231

注意：

我假设您的df1 index以1开头，如果它以0开头，则删除+1

Answer 2

让我们使用cumcount

df2.assign(index=df2.groupby('Type').cumcount()).\
      merge(df1.reset_index(),on='index').drop('index',1)
Out[59]: 
    Type  Value  start_date  end_date
0  House    100    20180101  20181231
1    Car    200    20180101  20181231
2    Bus    300    20180101  20181231
3  House    150    20170101  20171231
4    Car    220    20170101  20171231
5    Bus    320    20170101  20171231

以顺序方式连接数据帧

2 个答案: