合并具有重叠列的熊猫数据框

时间:2021-02-09 15:24:28

标签: python pandas

我有两个可以使用以下代码创建的数据框:

import yfinance as yf

symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")

symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-01", end="2019-01-07")

df1df2 的内容如下

> df1
           Adj Close              Close              High               Low  \
                GBTC         QQQ   GBTC         QQQ  GBTC         QQQ  GBTC   
Date                                                                          
2018-12-31     3.965  152.132996  3.965  154.259995  4.15  154.979996  3.95   
2019-01-02     4.620  152.744461  4.620  154.880005  4.65  155.750000  4.13   
2019-01-03     4.520  147.754257  4.520  149.820007  4.62  153.259995  4.32   
2019-01-04     4.530  154.075851  4.530  156.229996  4.65  157.000000  4.41   

                         Open               Volume            
                   QQQ   GBTC         QQQ     GBTC       QQQ  
Date                                                          
2018-12-31  152.710007  4.140  154.470001  3829000  53015300  
2019-01-02  150.880005  4.155  150.990005  2948200  58576700  
2019-01-03  149.490005  4.325  152.600006  1503000  74820200  
2019-01-04  151.740005  4.585  152.339996  2020700  74709300

> df2
           Adj Close              Close              High               Low  \
                GBTC         TLT   GBTC         TLT  GBTC         TLT  GBTC   
Date                                                                          
2018-12-31     3.965  116.845848  3.965  121.510002  4.15  121.559998  3.95   
2019-01-02     4.620  117.461304  4.620  122.150002  4.65  122.160004  4.13   
2019-01-03     4.520  118.797966  4.520  123.540001  4.62  123.860001  4.32   
2019-01-04     4.530  117.422844  4.530  122.110001  4.65  122.559998  4.41   

                         Open               Volume            
                   TLT   GBTC         TLT     GBTC       TLT  
Date                                                          
2018-12-31  120.459999  4.140  120.650002  3829000  17409000  
2019-01-02  121.339996  4.155  121.660004  2948200  19841500  
2019-01-03  122.230003  4.325  122.290001  1503000  21187000  
2019-01-04  121.650002  4.585  122.339996  2020700  12970200  

df1df2 都包含 GBTC 列。

如何将 df1df2 合并到具有以下内容的新数据框中?

> df3
           Adj Close                          Close                          \
                GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996  116.845848  3.965  154.259995  121.510002   
2019-01-02     4.620  152.744461  117.461304  4.620  154.880005  122.150002   
2019-01-03     4.520  147.754257  118.797966  4.520  149.820007  123.540001   
2019-01-04     4.530  154.075851  117.422844  4.530  156.229996  122.110001   

            High                           Low                           Open  \
            GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996  121.559998  3.95  152.710007  120.459999  4.140   
2019-01-02  4.65  155.750000  122.160004  4.13  150.880005  121.339996  4.155   
2019-01-03  4.62  153.259995  123.860001  4.32  149.490005  122.230003  4.325   
2019-01-04  4.65  157.000000  122.559998  4.41  151.740005  121.650002  4.585   

                                     Volume                      
                   QQQ         TLT     GBTC       QQQ       TLT  
Date                                                             
2018-12-31  154.470001  120.650002  3829000  53015300  17409000  
2019-01-02  150.990005  121.660004  2948200  58576700  19841500  
2019-01-03  152.600006  122.290001  1503000  74820200  21187000  
2019-01-04  152.339996  122.339996  2020700  74709300  12970200 

我可能有不止一个重叠的列。

看来pandas.DataFrame.merge无法实现我的目标。

1 个答案:

答案 0 :(得分:1)

  • unstack() 以便您有两个数据框来执行 merge()
  • df2 中选择新值作为偏好
  • pivot() 重塑身形
dfm = pd.merge(df1.unstack().to_frame().reset_index(), df2.unstack().to_frame().reset_index(), on=["level_0","level_1","Date"],how="outer")
(dfm.assign(**{"0_y":dfm["0_y"].fillna(dfm["0_x"])})
 .drop(columns="0_x")
 .rename(columns={"0_y":0})
 .pivot(index=["level_0","level_1"], columns="Date", values=0).T
)
<头>
日期 ('Adj Close', 'GBTC') ('Adj Close', 'QQQ') ('Adj Close', 'TLT') ('Close', 'GBTC') ('关闭', 'QQQ') ('Close', 'TLT') ('High', 'GBTC') ('High', 'QQQ') ('High', 'TLT') ('Low', 'GBTC') ('Low', 'QQQ') ('Low', 'TLT') ('Open', 'GBTC') ('打开', 'QQQ') ('Open', 'TLT') ('Volume', 'GBTC') ('音量', 'QQQ') ('音量', 'TLT')
2019-01-02 00:00:00 4.62 152.744 117.461 4.62 154.88 122.15 4.65 155.75 122.16 4.13 150.88 121.34 4.155 150.99 121.66 2.9482e+06 5.85767e+07 1.98415e+07
2019-01-03 00:00:00 4.52 147.754 118.798 4.52 149.82 123.54 4.62 153.26 123.86 4.32 149.49 122.23 4.325 152.6 122.29 1.503e+06 7.48202e+07 2.1187e+07
2019-01-04 00:00:00 4.53 154.076 117.423 4.53 156.23 122.11 4.65 157 122.56 4.41 151.74 121.65 4.585 152.34 122.34 2.0207e+06 7.47093e+07 1.29702e+07