合并具有重叠索引和列的熊猫数据帧

时间:2021-02-09 16:13:08

标签: python pandas

我有两个 DataFrame df1df2。它们在索引和列中都有重叠的数据。

import yfinance as yf

symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")

symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-03", end="2019-01-15")

这是df1的内容:

           Adj Close              Close              High               Low  \
                GBTC         QQQ   GBTC         QQQ  GBTC         QQQ  GBTC   
Date                                                                          
2018-12-31     3.965  152.132996  3.965  154.259995  4.15  154.979996  3.95   
2019-01-02     4.620  152.744461  4.620  154.880005  4.65  155.750000  4.13   
2019-01-03     4.520  147.754257  4.520  149.820007  4.62  153.259995  4.32   
2019-01-04     4.530  154.075851  4.530  156.229996  4.65  157.000000  4.41   

                         Open               Volume            
                   QQQ   GBTC         QQQ     GBTC       QQQ  
Date                                                          
2018-12-31  152.710007  4.140  154.470001  3829000  53015300  
2019-01-02  150.880005  4.155  150.990005  2948200  58576700  
2019-01-03  149.490005  4.325  152.600006  1503000  74820200  
2019-01-04  151.740005  4.585  152.339996  2020700  74709300 

这是df2

           Adj Close             Close              High               Low  \
                GBTC         TLT  GBTC         TLT  GBTC         TLT  GBTC   
Date                                                                         
2019-01-02      4.62  117.461304  4.62  122.150002  4.65  122.160004  4.13   
2019-01-03      4.52  118.797966  4.52  123.540001  4.62  123.860001  4.32   
2019-01-04      4.53  117.422844  4.53  122.110001  4.65  122.559998  4.41   
2019-01-07      4.86  117.076653  4.86  121.750000  4.94  122.650002  4.74   
2019-01-08      4.96  116.768936  4.96  121.430000  5.08  121.940002  4.84   
2019-01-09      4.71  116.586197  4.71  121.239998  5.02  121.430000  4.63   
2019-01-10      4.32  115.836174  4.32  120.459999  4.46  121.410004  4.16   
2019-01-11      4.32  116.288139  4.32  120.930000  4.49  121.269997  4.25   
2019-01-14      4.47  115.855431  4.47  120.480003  4.55  121.010002  4.14   

                         Open               Volume            
                   TLT   GBTC         TLT     GBTC       TLT  
Date                                                          
2019-01-02  121.339996  4.155  121.660004  2948200  19841500  
2019-01-03  122.230003  4.325  122.290001  1503000  21187000  
2019-01-04  121.650002  4.585  122.339996  2020700  12970200  
2019-01-07  121.620003  4.740  122.620003  2676600   8498100  
2019-01-08  121.389999  4.895  121.690002  2653200   7737100  
2019-01-09  120.800003  5.015  121.260002  2778000   9349200  
2019-01-10  120.339996  4.455  121.279999  3799800   8222900  
2019-01-11  120.680000  4.410  120.830002  1218500   5786900  
2019-01-14  120.239998  4.145  120.900002  2581600   6730500 

如何将 df1df2 合并到 df3 中,以便 df3 具有以下内容?

> df3
         Adj Close                          Close                          \
                GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996         NaN  3.965  154.259995         NaN 
2019-01-02     4.620  152.744461  117.461304  4.620  154.880005  122.150002    
2019-01-03     4.520  147.754257  118.797966  4.520  149.820007  123.540001 
2019-01-04     4.530  154.075851  117.422844  4.530  156.229996  122.110001   
2019-01-07     4.860         NaN  117.076653  4.860         NaN  121.750000   
2019-01-08     4.960         NaN  116.768936  4.960         NaN  121.430000   
2019-01-09     4.710         NaN  116.586197  4.710         NaN  121.239998   
2019-01-10     4.320         NaN  115.836174  4.320         NaN  120.459999   
2019-01-11     4.320         NaN  116.288139  4.320         NaN  120.930000   
2019-01-14     4.470         NaN  115.855431  4.470         NaN  120.480003   

            High                           Low                           Open  \
            GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996         NaN  3.95  152.710007         NaN  4.140    
2019-01-02  4.65  155.750000  122.160004  4.13  150.880005  121.339996  4.155     
2019-01-03  4.62  153.259995  123.860001  4.32  149.490005  122.230003  4.325    
2019-01-04  4.65  157.000000  122.559998  4.41  151.740005  121.650002  4.585   
2019-01-07  4.94         NaN  122.650002  4.74         NaN  121.620003  4.740   
2019-01-08  5.08         NaN  121.940002  4.84         NaN  121.389999  4.895   
2019-01-09  5.02         NaN  121.430000  4.63         NaN  120.800003  5.015   
2019-01-10  4.46         NaN  121.410004  4.16         NaN  120.339996  4.455   
2019-01-11  4.49         NaN  121.269997  4.25         NaN  120.680000  4.410   
2019-01-14  4.55         NaN  121.010002  4.14         NaN  120.239998  4.145   

                                     Volume                          
                   QQQ         TLT     GBTC         QQQ         TLT  
Date                                                                 
2018-12-31  154.470001         NaN  3829000  53015300.0         NaN  
2019-01-02  150.990005  121.660004  2948200  58576700.0  19841500.0  
2019-01-03  152.600006  122.290001  1503000  74820200.0  21187000.0  
2019-01-04  152.339996  122.339996  2020700  74709300.0  12970200.0  
2019-01-07         NaN  122.620003  2676600         NaN   8498100.0  
2019-01-08         NaN  121.690002  2653200         NaN   7737100.0  
2019-01-09         NaN  121.260002  2778000         NaN   9349200.0  
2019-01-10         NaN  121.279999  3799800         NaN   8222900.0  
2019-01-11         NaN  120.830002  1218500         NaN   5786900.0  
2019-01-14         NaN  120.900002  2581600         NaN   6730500.0 

df4 = df1.append(df2).drop_duplicates().sort_index() 返回一个类似于 df3 的数据帧。

但是 df3df4 仍然不同。

> df4
           Adj Close                          Close                          \
                GBTC         QQQ         TLT   GBTC         QQQ         TLT   
Date                                                                          
2018-12-31     3.965  152.132996         NaN  3.965  154.259995         NaN   
2019-01-02     4.620  152.744461         NaN  4.620  154.880005         NaN   
2019-01-02     4.620         NaN  117.461304  4.620         NaN  122.150002   
2019-01-03     4.520  147.754257         NaN  4.520  149.820007         NaN   
2019-01-03     4.520         NaN  118.797966  4.520         NaN  123.540001   
2019-01-04     4.530  154.075851         NaN  4.530  156.229996         NaN   
2019-01-04     4.530         NaN  117.422844  4.530         NaN  122.110001   
2019-01-07     4.860         NaN  117.076653  4.860         NaN  121.750000   
2019-01-08     4.960         NaN  116.768936  4.960         NaN  121.430000   
2019-01-09     4.710         NaN  116.586197  4.710         NaN  121.239998   
2019-01-10     4.320         NaN  115.836174  4.320         NaN  120.459999   
2019-01-11     4.320         NaN  116.288139  4.320         NaN  120.930000   
2019-01-14     4.470         NaN  115.855431  4.470         NaN  120.480003   

            High                           Low                           Open  \
            GBTC         QQQ         TLT  GBTC         QQQ         TLT   GBTC   
Date                                                                            
2018-12-31  4.15  154.979996         NaN  3.95  152.710007         NaN  4.140   
2019-01-02  4.65  155.750000         NaN  4.13  150.880005         NaN  4.155   
2019-01-02  4.65         NaN  122.160004  4.13         NaN  121.339996  4.155   
2019-01-03  4.62  153.259995         NaN  4.32  149.490005         NaN  4.325   
2019-01-03  4.62         NaN  123.860001  4.32         NaN  122.230003  4.325   
2019-01-04  4.65  157.000000         NaN  4.41  151.740005         NaN  4.585   
2019-01-04  4.65         NaN  122.559998  4.41         NaN  121.650002  4.585   
2019-01-07  4.94         NaN  122.650002  4.74         NaN  121.620003  4.740   
2019-01-08  5.08         NaN  121.940002  4.84         NaN  121.389999  4.895   
2019-01-09  5.02         NaN  121.430000  4.63         NaN  120.800003  5.015   
2019-01-10  4.46         NaN  121.410004  4.16         NaN  120.339996  4.455   
2019-01-11  4.49         NaN  121.269997  4.25         NaN  120.680000  4.410   
2019-01-14  4.55         NaN  121.010002  4.14         NaN  120.239998  4.145   

                                     Volume                          
                   QQQ         TLT     GBTC         QQQ         TLT  
Date                                                                 
2018-12-31  154.470001         NaN  3829000  53015300.0         NaN  
2019-01-02  150.990005         NaN  2948200  58576700.0         NaN  
2019-01-02         NaN  121.660004  2948200         NaN  19841500.0  
2019-01-03  152.600006         NaN  1503000  74820200.0         NaN  
2019-01-03         NaN  122.290001  1503000         NaN  21187000.0  
2019-01-04  152.339996         NaN  2020700  74709300.0         NaN  
2019-01-04         NaN  122.339996  2020700         NaN  12970200.0  
2019-01-07         NaN  122.620003  2676600         NaN   8498100.0  
2019-01-08         NaN  121.690002  2653200         NaN   7737100.0  
2019-01-09         NaN  121.260002  2778000         NaN   9349200.0  
2019-01-10         NaN  121.279999  3799800         NaN   8222900.0  
2019-01-11         NaN  120.830002  1218500         NaN   5786900.0  
2019-01-14         NaN  120.900002  2581600         NaN   6730500.0

1 个答案:

答案 0 :(得分:0)

不确定这是否是您想要的,但这似乎是 df3

pd.concat([df1.stack(), df2.stack()]).sort_values(by='Date').drop_duplicates().unstack()

这里是在 Pandas 中重塑的文档,它可能会更清楚地了解堆栈和取消堆栈操作。标题中的多索引可以“堆叠”到行,然后用 concat 将两个数据帧压缩在一起。由于您在重叠的日期范围内在两个数据框中抓取相同的符号,因此需要删除重复项。然后要返回原始格式,只需将其拆开即可。

https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html