Pandas 通过在匹配条件下堆叠列与值来组合数据帧

时间:2021-03-24 23:16:34

标签: python pandas dataframe split-apply-combine

我想以下列方式组合数据框 1 和 2:

  • 日期列是一键
  • 第二个键是数据帧 1 的标头和数据帧 2 的 Project 变量
  • 在新数据帧中,V1 对应于数据帧 1 中这些键匹配的值
  • 如果键与 S1 不匹配,则 S2 和 S3 值为空(例如第 0 行)
  • 如果键匹配 S1、S2、S3 值,则从数据帧 2(例如第 1、2 和 3 行)连接起来

我尝试了堆叠组合来获得这些结果,但我无法成功,有什么想法吗?

**Dataframe 1**
     Date           C0          C1       C2        C3
0  2021-03-24  2547.502499  220.815585  91.2  10.764182
1  2021-02-01  2147.502499  219.815585  62.2   8.764182

**Dataframe 2**
    Project   Date          S1    S2         S3
0      C1  2021-03-24  151.733282  67.2   1.882302
1      C1  2021-02-01     150.1  60.2     0.812302
2      C2  2021-03-24   15.15005  50.9  25.200000
**Expected Result**
     Date     Project    V1           S1         S2       S3
0  2021-03-24   C0   2547.502499     NaN         NaN      NaN
1  2021-03-24   C1   220.815585   151.733282    67.2   1.882302
2  2021-03-24   C2      62.2       15.15005     50.9  25.200000
3  2021-02-01   C1   219.815585      150.1      60.2   0.812302
...

1 个答案:

答案 0 :(得分:1)

使用 stackmerge

(df1.set_index('Date')
    .stack()
    .reset_index()
    .rename(columns = {'level_1' : 'Project', 0 : 'V1'})
    .merge(df2, on = ['Date','Project'], how = 'left')
)

输出:

    Date        Project            V1        S1     S2          S3
--  ----------  ---------  ----------  --------  -----  ----------
 0  2021-03-24  C0         2547.5      nan       nan    nan
 1  2021-03-24  C1          220.816    151.733    67.2    1.8823
 2  2021-03-24  C2           91.2       15.1501   50.9   25.2
 3  2021-03-24  C3           10.7642   nan       nan    nan
 4  2021-02-01  C0         2147.5      nan       nan    nan
 5  2021-02-01  C1          219.816    150.1      60.2    0.812302
 6  2021-02-01  C2           62.2      nan       nan    nan
 7  2021-02-01  C3            8.76418  nan       nan    nan