我有两个可以使用以下代码创建的数据框:
import yfinance as yf
symbols = ['QQQ', 'GBTC']
df1 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
symbols = ['GBTC', 'TLT']
df2 = yf.download(symbols, start="2019-01-01", end="2019-01-07")
df1
和 df2
的内容如下
> df1
Adj Close Close High Low \
GBTC QQQ GBTC QQQ GBTC QQQ GBTC
Date
2018-12-31 3.965 152.132996 3.965 154.259995 4.15 154.979996 3.95
2019-01-02 4.620 152.744461 4.620 154.880005 4.65 155.750000 4.13
2019-01-03 4.520 147.754257 4.520 149.820007 4.62 153.259995 4.32
2019-01-04 4.530 154.075851 4.530 156.229996 4.65 157.000000 4.41
Open Volume
QQQ GBTC QQQ GBTC QQQ
Date
2018-12-31 152.710007 4.140 154.470001 3829000 53015300
2019-01-02 150.880005 4.155 150.990005 2948200 58576700
2019-01-03 149.490005 4.325 152.600006 1503000 74820200
2019-01-04 151.740005 4.585 152.339996 2020700 74709300
> df2
Adj Close Close High Low \
GBTC TLT GBTC TLT GBTC TLT GBTC
Date
2018-12-31 3.965 116.845848 3.965 121.510002 4.15 121.559998 3.95
2019-01-02 4.620 117.461304 4.620 122.150002 4.65 122.160004 4.13
2019-01-03 4.520 118.797966 4.520 123.540001 4.62 123.860001 4.32
2019-01-04 4.530 117.422844 4.530 122.110001 4.65 122.559998 4.41
Open Volume
TLT GBTC TLT GBTC TLT
Date
2018-12-31 120.459999 4.140 120.650002 3829000 17409000
2019-01-02 121.339996 4.155 121.660004 2948200 19841500
2019-01-03 122.230003 4.325 122.290001 1503000 21187000
2019-01-04 121.650002 4.585 122.339996 2020700 12970200
df1
和 df2
都包含 GBTC
列。
如何将 df1
和 df2
合并到具有以下内容的新数据框中?
> df3
Adj Close Close \
GBTC QQQ TLT GBTC QQQ TLT
Date
2018-12-31 3.965 152.132996 116.845848 3.965 154.259995 121.510002
2019-01-02 4.620 152.744461 117.461304 4.620 154.880005 122.150002
2019-01-03 4.520 147.754257 118.797966 4.520 149.820007 123.540001
2019-01-04 4.530 154.075851 117.422844 4.530 156.229996 122.110001
High Low Open \
GBTC QQQ TLT GBTC QQQ TLT GBTC
Date
2018-12-31 4.15 154.979996 121.559998 3.95 152.710007 120.459999 4.140
2019-01-02 4.65 155.750000 122.160004 4.13 150.880005 121.339996 4.155
2019-01-03 4.62 153.259995 123.860001 4.32 149.490005 122.230003 4.325
2019-01-04 4.65 157.000000 122.559998 4.41 151.740005 121.650002 4.585
Volume
QQQ TLT GBTC QQQ TLT
Date
2018-12-31 154.470001 120.650002 3829000 53015300 17409000
2019-01-02 150.990005 121.660004 2948200 58576700 19841500
2019-01-03 152.600006 122.290001 1503000 74820200 21187000
2019-01-04 152.339996 122.339996 2020700 74709300 12970200
我可能有不止一个重叠的列。
看来pandas.DataFrame.merge
无法实现我的目标。
答案 0 :(得分:1)
unstack()
以便您有两个数据框来执行 merge()
pivot()
重塑身形dfm = pd.merge(df1.unstack().to_frame().reset_index(), df2.unstack().to_frame().reset_index(), on=["level_0","level_1","Date"],how="outer")
(dfm.assign(**{"0_y":dfm["0_y"].fillna(dfm["0_x"])})
.drop(columns="0_x")
.rename(columns={"0_y":0})
.pivot(index=["level_0","level_1"], columns="Date", values=0).T
)
日期 | ('Adj Close', 'GBTC') | ('Adj Close', 'QQQ') | ('Adj Close', 'TLT') | ('Close', 'GBTC') | ('关闭', 'QQQ') | ('Close', 'TLT') | ('High', 'GBTC') | ('High', 'QQQ') | ('High', 'TLT') | ('Low', 'GBTC') | ('Low', 'QQQ') | ('Low', 'TLT') | ('Open', 'GBTC') | ('打开', 'QQQ') | ('Open', 'TLT') | ('Volume', 'GBTC') | ('音量', 'QQQ') | ('音量', 'TLT') |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2019-01-02 00:00:00 | 4.62 | 152.744 | 117.461 | 4.62 | 154.88 | 122.15 | 4.65 | 155.75 | 122.16 | 4.13 | 150.88 | 121.34 | 4.155 | 150.99 | 121.66 | 2.9482e+06 | 5.85767e+07 | 1.98415e+07 |
2019-01-03 00:00:00 | 4.52 | 147.754 | 118.798 | 4.52 | 149.82 | 123.54 | 4.62 | 153.26 | 123.86 | 4.32 | 149.49 | 122.23 | 4.325 | 152.6 | 122.29 | 1.503e+06 | 7.48202e+07 | 2.1187e+07 |
2019-01-04 00:00:00 | 4.53 | 154.076 | 117.423 | 4.53 | 156.23 | 122.11 | 4.65 | 157 | 122.56 | 4.41 | 151.74 | 121.65 | 4.585 | 152.34 | 122.34 | 2.0207e+06 | 7.47093e+07 | 1.29702e+07 |