熊猫按日期合并两个数据框,最后以整个NaN列结尾

时间:2018-12-03 13:45:23

标签: python pandas dataframe merge

不幸的是,我经历了许多类似查询的例子,但都没有成功。 我有两个数据框需要合并。

df1

     .       DATE            HIGH	        LOW		OPEN		CLOSE
0	2013-01-04	10734.23	10602.24	10604.50	10688.11
1	2013-01-07	10743.69	10589.70	10743.69	10599.01
2	2013-01-08	10602.12	10463.43	10544.21	10508.06
3	2013-01-09	10620.70	10398.61	10405.67	10578.57
4	2013-01-10	10686.12	10619.65	10635.11	10652.64
5	2013-01-11	10830.43	10748.06	10786.14	10801.57
6	2013-01-15	10952.31	10851.66	10914.65	10879.08
7	2013-01-16	10806.41	10591.30	10806.41	10600.44

df2

.        Date          sentiment
0	2013-01-01	    -0.027282
1	2013-01-02	    0.063613
2	2013-01-03	    0.091363
3	2013-01-04	    0.092818
4	2013-01-05	    -0.019002
5	2013-01-06	    -0.033752
6	2013-01-07	    0.060038
7	2013-01-08	    0.081649
8	2013-01-09	    -0.031924
9	2013-01-10	    0.109111
10	2013-01-11	  -0.057070
11	2013-01-12	  -0.052431
12	2013-01-13	  0.014726
13	2013-01-14	  0.047232
14	2013-01-15	  0.060790
15	2013-01-16	  -0.067828
16	2013-01-17	  -0.035174

使用的代码: merged_left = pd.merge(left = df1,right = df2,how ='left',left_on ='Date',right_on ='Date')

因此,我失去了情感数据中的所有内容,如下所示:

.         Date		HIGH		LOW		OPEN		CLOSE		sentiment
0	2013-01-04	10734.23	10602.24	10604.50	10688.11	NaN
1	2013-01-07	10743.69	10589.70	10743.69	10599.01	NaN
2	2013-01-08	10602.12	10463.43	10544.21	10508.06	NaN
3	2013-01-09	10620.70	10398.61	10405.67	10578.57	NaN
4	2013-01-10	10686.12	10619.65	10635.11	10652.64	NaN
5	2013-01-11	10830.43	10748.06	10786.14	10801.57	NaN
6	2013-01-15	10952.31	10851.66	10914.65	10879.08	NaN
7	2013-01-16	10806.41	10591.30	10806.41	10600.44	NaN

如下所示,df2是具有2157行的较大数据框,许多日期不在df(1447行)中...这些日期是 不需要,基本上我只想要df1中存在的相应日期的情感数据

.       Date		HIGH		LOW		OPEN		CLOSE		sentiment
0	2013-01-04	10734.23	10602.24	10604.50	10688.11	0.092818
1	2013-01-07	10743.69	10589.70	10743.69	10599.01	0.060038
2	2013-01-08	10602.12	10463.43	10544.21	10508.06	0.081649
3	2013-01-09	10620.70	10398.61	10405.67	10578.57	-0.031924
4	2013-01-10	10686.12	10619.65	10635.11	10652.64	0.109111
5	2013-01-11	10830.43	10748.06	10786.14	10801.57	-0.057070
6	2013-01-15	10952.31	10851.66	10914.65	10879.08	0.060790
7	2013-01-16	10806.41	10591.30	10806.41	10600.44	-0.067828

任何帮助都会非常感激……整个周末都在解决这个问题。

1 个答案:

答案 0 :(得分:0)

问题在于两列都需要日期时间,并且默认的内部联接也需要,因此how='inner'应该省略:

 df1['Date'] = pd.to_datetime(df1['Date'])
 df2['Date'] = pd.to_datetime(df2['Date'])
 merged_left = pd.merge(df1, df2, on='Date')