我有两个 DataFrame df1
和 df2
。它们在索引和列中都有重叠的数据。
import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-03", end="2019-01-15")
这是df1
的内容:
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2018-12-31 3.965 152.132996 3.965 154.259995 4.15 154.979996 3.95
2019-01-02 4.620 152.744461 4.620 154.880005 4.65 155.750000 4.13
2019-01-03 4.520 147.754257 4.520 149.820007 4.62 153.259995 4.32
2019-01-04 4.530 154.075851 4.530 156.229996 4.65 157.000000 4.41
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2018-12-31 152.710007 4.140 154.470001 3829000 53015300
2019-01-02 150.880005 4.155 150.990005 2948200 58576700
2019-01-03 149.490005 4.325 152.600006 1503000 74820200
2019-01-04 151.740005 4.585 152.339996 2020700 74709300
这是df2
:
Adj Close Close High Low \
GBTC TLT GBTC TLT GBTC TLT GBTC
Date
2019-01-02 4.62 117.461304 4.62 122.150002 4.65 122.160004 4.13
2019-01-03 4.52 118.797966 4.52 123.540001 4.62 123.860001 4.32
2019-01-04 4.53 117.422844 4.53 122.110001 4.65 122.559998 4.41
2019-01-07 4.86 117.076653 4.86 121.750000 4.94 122.650002 4.74
2019-01-08 4.96 116.768936 4.96 121.430000 5.08 121.940002 4.84
2019-01-09 4.71 116.586197 4.71 121.239998 5.02 121.430000 4.63
2019-01-10 4.32 115.836174 4.32 120.459999 4.46 121.410004 4.16
2019-01-11 4.32 116.288139 4.32 120.930000 4.49 121.269997 4.25
2019-01-14 4.47 115.855431 4.47 120.480003 4.55 121.010002 4.14
Open Volume
TLT GBTC TLT GBTC TLT
Date
2019-01-02 121.339996 4.155 121.660004 2948200 19841500
2019-01-03 122.230003 4.325 122.290001 1503000 21187000
2019-01-04 121.650002 4.585 122.339996 2020700 12970200
2019-01-07 121.620003 4.740 122.620003 2676600 8498100
2019-01-08 121.389999 4.895 121.690002 2653200 7737100
2019-01-09 120.800003 5.015 121.260002 2778000 9349200
2019-01-10 120.339996 4.455 121.279999 3799800 8222900
2019-01-11 120.680000 4.410 120.830002 1218500 5786900
2019-01-14 120.239998 4.145 120.900002 2581600 6730500
如何将 df1
和 df2
合并到 df3
中,以便 df3
具有以下内容?
> df3
Adj Close Close \
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 NaN 3.965 154.259995 NaN
2019-01-02 4.620 152.744461 117.461304 4.620 154.880005 122.150002
2019-01-03 4.520 147.754257 118.797966 4.520 149.820007 123.540001
2019-01-04 4.530 154.075851 117.422844 4.530 156.229996 122.110001
2019-01-07 4.860 NaN 117.076653 4.860 NaN 121.750000
2019-01-08 4.960 NaN 116.768936 4.960 NaN 121.430000
2019-01-09 4.710 NaN 116.586197 4.710 NaN 121.239998
2019-01-10 4.320 NaN 115.836174 4.320 NaN 120.459999
2019-01-11 4.320 NaN 116.288139 4.320 NaN 120.930000
2019-01-14 4.470 NaN 115.855431 4.470 NaN 120.480003
High Low Open \
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 NaN 3.95 152.710007 NaN 4.140
2019-01-02 4.65 155.750000 122.160004 4.13 150.880005 121.339996 4.155
2019-01-03 4.62 153.259995 123.860001 4.32 149.490005 122.230003 4.325
2019-01-04 4.65 157.000000 122.559998 4.41 151.740005 121.650002 4.585
2019-01-07 4.94 NaN 122.650002 4.74 NaN 121.620003 4.740
2019-01-08 5.08 NaN 121.940002 4.84 NaN 121.389999 4.895
2019-01-09 5.02 NaN 121.430000 4.63 NaN 120.800003 5.015
2019-01-10 4.46 NaN 121.410004 4.16 NaN 120.339996 4.455
2019-01-11 4.49 NaN 121.269997 4.25 NaN 120.680000 4.410
2019-01-14 4.55 NaN 121.010002 4.14 NaN 120.239998 4.145
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 NaN 3829000 53015300.0 NaN
2019-01-02 150.990005 121.660004 2948200 58576700.0 19841500.0
2019-01-03 152.600006 122.290001 1503000 74820200.0 21187000.0
2019-01-04 152.339996 122.339996 2020700 74709300.0 12970200.0
2019-01-07 NaN 122.620003 2676600 NaN 8498100.0
2019-01-08 NaN 121.690002 2653200 NaN 7737100.0
2019-01-09 NaN 121.260002 2778000 NaN 9349200.0
2019-01-10 NaN 121.279999 3799800 NaN 8222900.0
2019-01-11 NaN 120.830002 1218500 NaN 5786900.0
2019-01-14 NaN 120.900002 2581600 NaN 6730500.0
df4 = df1.append(df2).drop_duplicates().sort_index()
返回一个类似于 df3
的数据帧。
但是 df3
和 df4
仍然不同。
> df4
Adj Close Close \
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 NaN 3.965 154.259995 NaN
2019-01-02 4.620 152.744461 NaN 4.620 154.880005 NaN
2019-01-02 4.620 NaN 117.461304 4.620 NaN 122.150002
2019-01-03 4.520 147.754257 NaN 4.520 149.820007 NaN
2019-01-03 4.520 NaN 118.797966 4.520 NaN 123.540001
2019-01-04 4.530 154.075851 NaN 4.530 156.229996 NaN
2019-01-04 4.530 NaN 117.422844 4.530 NaN 122.110001
2019-01-07 4.860 NaN 117.076653 4.860 NaN 121.750000
2019-01-08 4.960 NaN 116.768936 4.960 NaN 121.430000
2019-01-09 4.710 NaN 116.586197 4.710 NaN 121.239998
2019-01-10 4.320 NaN 115.836174 4.320 NaN 120.459999
2019-01-11 4.320 NaN 116.288139 4.320 NaN 120.930000
2019-01-14 4.470 NaN 115.855431 4.470 NaN 120.480003
High Low Open \
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 NaN 3.95 152.710007 NaN 4.140
2019-01-02 4.65 155.750000 NaN 4.13 150.880005 NaN 4.155
2019-01-02 4.65 NaN 122.160004 4.13 NaN 121.339996 4.155
2019-01-03 4.62 153.259995 NaN 4.32 149.490005 NaN 4.325
2019-01-03 4.62 NaN 123.860001 4.32 NaN 122.230003 4.325
2019-01-04 4.65 157.000000 NaN 4.41 151.740005 NaN 4.585
2019-01-04 4.65 NaN 122.559998 4.41 NaN 121.650002 4.585
2019-01-07 4.94 NaN 122.650002 4.74 NaN 121.620003 4.740
2019-01-08 5.08 NaN 121.940002 4.84 NaN 121.389999 4.895
2019-01-09 5.02 NaN 121.430000 4.63 NaN 120.800003 5.015
2019-01-10 4.46 NaN 121.410004 4.16 NaN 120.339996 4.455
2019-01-11 4.49 NaN 121.269997 4.25 NaN 120.680000 4.410
2019-01-14 4.55 NaN 121.010002 4.14 NaN 120.239998 4.145
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 NaN 3829000 53015300.0 NaN
2019-01-02 150.990005 NaN 2948200 58576700.0 NaN
2019-01-02 NaN 121.660004 2948200 NaN 19841500.0
2019-01-03 152.600006 NaN 1503000 74820200.0 NaN
2019-01-03 NaN 122.290001 1503000 NaN 21187000.0
2019-01-04 152.339996 NaN 2020700 74709300.0 NaN
2019-01-04 NaN 122.339996 2020700 NaN 12970200.0
2019-01-07 NaN 122.620003 2676600 NaN 8498100.0
2019-01-08 NaN 121.690002 2653200 NaN 7737100.0
2019-01-09 NaN 121.260002 2778000 NaN 9349200.0
2019-01-10 NaN 121.279999 3799800 NaN 8222900.0
2019-01-11 NaN 120.830002 1218500 NaN 5786900.0
2019-01-14 NaN 120.900002 2581600 NaN 6730500.0
答案 0 :(得分:0)
不确定这是否是您想要的,但这似乎是 df3
。
pd.concat([df1.stack(), df2.stack()]).sort_values(by='Date').drop_duplicates().unstack()
这里是在 Pandas 中重塑的文档,它可能会更清楚地了解堆栈和取消堆栈操作。标题中的多索引可以“堆叠”到行,然后用 concat 将两个数据帧压缩在一起。由于您在重叠的日期范围内在两个数据框中抓取相同的符号,因此需要删除重复项。然后要返回原始格式,只需将其拆开即可。
https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html