我有一些数据帧,其中包含来自多个来源的日期索引,我希望将其合并到一个多索引数据框中。我正在努力弄清楚如何做到这一点。
从两个数据帧开始:
来源1
+---------------------+------+------+-----+-------+
| date | open | high | low | close |
+---------------------+------+------+-----+-------+
| 2018-04-04 20:00:00 | xxx | xxx | xxx | xxx |
| 2018-04-04 21:00:00 | xxx | xxx | xxx | xxx |
| 2018-04-04 22:00:00 | xxx | xxx | xxx | xxx |
+---------------------+------+------+-----+-------+
来源2
+---------------------+------+------+-----+-------+
| date | open | high | low | close |
+---------------------+------+------+-----+-------+
| 2018-04-04 20:00:00 | xxx | xxx | xxx | xxx |
| 2018-04-04 21:00:00 | xxx | xxx | xxx | xxx |
| 2018-04-04 22:00:00 | xxx | xxx | xxx | xxx |
+---------------------+------+------+-----+-------+
我想合并它们,以便在source1或source2的日期对它们进行多索引。
类似的东西:
+---------------------+---------+------+-----+-------+
| | | | | |
+---------------------+---------+------+-----+-------+
| 2018-04-04 20:00:00 | source1 | | | |
| | open | high | low | close |
| | xxx | xxx | xxx | xxx |
| | source2 | | | |
| | open | high | low | close |
| | xxx | xxx | xxx | xxx |
| 2018-04-04 21:00:00 | source1 | | | |
| | open | high | low | close |
| | xxx | xxx | xxx | xxx |
| | source2 | | | |
| | open | high | low | close |
| | xxx | xxx | xxx | xxx |
| 2018-04-04 22:00:00 | source1 | | | |
| | open | high | low | close |
| | xxx | xxx | xxx | xxx |
| | source2 | | | |
| | open | high | low | close |
| | xxx | xxx | xxx | xxx |
+---------------------+---------+------+-----+-------+
有人可以帮忙吗?
谢谢!
答案 0 :(得分:0)
你可以去concat
指定密钥,即
df3 = pd.concat([df1,df2],keys=['source1','source2']).reset_index(level=0)
df3 = df3.set_index(['date','level_0']).sort_index(level='date')
open high low close
date level_0
2018-04-04 20:00:00 source1 xxx xxx xxx xxx
source2 xxx xxx xxx xxx
2018-04-04 21:00:00 source1 xxx xxx xxx xxx
source2 xxx xxx xxx xxx
2018-04-04 22:00:00 source1 xxx xxx xxx xxx
source2 xxx xxx xxx xxx
答案 1 :(得分:0)
将concat
与keys
和set_index
一起用于DatetimeIndex
,然后swaplevel
与sort_index
一起使用:
df = (pd.concat([df1.set_index('date'),df2.set_index('date')], keys=['source1','source2'])
.swaplevel(0,1)
.sort_index())
print (df)
open high low close
date
2018-04-04 20:00:00 source1 xxx xxx xxx xxx
source2 xxx xxx xxx xxx
2018-04-04 21:00:00 source1 xxx xxx xxx xxx
source2 xxx xxx xxx xxx
2018-04-04 22:00:00 source1 xxx xxx xxx xxx
source2 xxx xxx xxx xxx