映射两个表的日期时间列

时间:2019-01-02 16:10:36

标签: python pandas datetime

我在2018年1月1日创建了一个仅具有datetime列且间隔为1秒的数据帧,如以下代码所示。

i = pd.date_range(start='2018-01-01 00:00:00', end='2018-01-01 23:59:00', freq="1S")
ts = pd.DataFrame(index=i)
ts = ts.reset_index()
ts = ts.rename(columns={'index': 'datetime'})`

df1:

    datetime
0   2018-01-01 00:00:00
1   2018-01-01 00:00:01
2   2018-01-01 00:00:02
3   2018-01-01 00:00:03
4   2018-01-01 00:00:04
5   2018-01-01 00:00:05
6   2018-01-01 00:00:06
7   2018-01-01 00:00:07
8   2018-01-01 00:00:08
9   2018-01-01 00:00:09
10  2018-01-01 00:00:10
11  2018-01-01 00:00:11
12  2018-01-01 00:00:12
13  2018-01-01 00:00:13
14  2018-01-01 00:00:14
15  2018-01-01 00:00:15
16  2018-01-01 00:00:16
17  2018-01-01 00:00:17
18  2018-01-01 00:00:18
19  2018-01-01 00:00:19
20  2018-01-01 00:00:20
21  2018-01-01 00:00:21
22  2018-01-01 00:00:22
23  2018-01-01 00:00:23
24  2018-01-01 00:00:24
25  2018-01-01 00:00:25
26  2018-01-01 00:00:26
27  2018-01-01 00:00:27
28  2018-01-01 00:00:28
29  2018-01-01 00:00:29`

我有另一个带有日期时间列和其他列的数据框

df2:

    datetime                a       b        c         d        e  
0   2018-01-01 00:00:04     0.9                                    
1   2018-01-01 00:00:06             0.6      0.7                   
2   2018-01-01 00:00:09     0.5              0.7       0.8         
3   2018-01-01 00:00:16             2.3      3.6       4.9      5.0
4   2018-01-01 00:00:17     0.9     3.5      5.5                   
5   2018-01-01 00:00:23     0.1     0.6      0.0       1.7         
6   2018-01-01 00:00:29     2.7     5.5      4.3                   `

现在我正在尝试使用pandas外连接映射df1和df2的日期时间列,我希望我的预期结果看起来像

    datetime                a       b       c     d      e
0   2018-01-01 00:00:00
1   2018-01-01 00:00:01
2   2018-01-01 00:00:02
3   2018-01-01 00:00:03
4   2018-01-01 00:00:04     0.9
5   2018-01-01 00:00:05
6   2018-01-01 00:00:06             0.6      0.7
7   2018-01-01 00:00:07
8   2018-01-01 00:00:08
9   2018-01-01 00:00:09     0.5              0.7   0.8
10  2018-01-01 00:00:10
11  2018-01-01 00:00:11
12  2018-01-01 00:00:12
13  2018-01-01 00:00:13
14  2018-01-01 00:00:14
15  2018-01-01 00:00:15
16  2018-01-01 00:00:16             2.3      3.6   4.9   5.0
17  2018-01-01 00:00:17     0.9     3.5      5.5
18  2018-01-01 00:00:18
19  2018-01-01 00:00:19
20  2018-01-01 00:00:20
21  2018-01-01 00:00:21
22  2018-01-01 00:00:22
23  2018-01-01 00:00:23     0.1     0.6      0.0   1.7
24  2018-01-01 00:00:24
25  2018-01-01 00:00:25
26  2018-01-01 00:00:26
27  2018-01-01 00:00:27
28  2018-01-01 00:00:28
29  2018-01-01 00:00:29     2.7     5.5      4.3              `

但是我的输出看起来像这样

    datetime                a       b        c         d        e
0   2018-01-01 00:00:00
1   2018-01-01 00:00:01
2   2018-01-01 00:00:02
3   2018-01-01 00:00:03
4   2018-01-01 00:00:04
5   2018-01-01 00:00:05
6   2018-01-01 00:00:06
7   2018-01-01 00:00:07
8   2018-01-01 00:00:08
9   2018-01-01 00:00:09
10  2018-01-01 00:00:10
11  2018-01-01 00:00:11
12  2018-01-01 00:00:12
13  2018-01-01 00:00:13
14  2018-01-01 00:00:14
15  2018-01-01 00:00:15
16  2018-01-01 00:00:16
17  2018-01-01 00:00:17
18  2018-01-01 00:00:18
19  2018-01-01 00:00:19
20  2018-01-01 00:00:20
21  2018-01-01 00:00:21
22  2018-01-01 00:00:22
23  2018-01-01 00:00:23
24  2018-01-01 00:00:24
25  2018-01-01 00:00:25
26  2018-01-01 00:00:26
27  2018-01-01 00:00:27
28  2018-01-01 00:00:28
29  2018-01-01 00:00:29  
30  2018-01-01 00:00:04     0.9                                    
31  2018-01-01 00:00:06             0.6      0.7                   
32  2018-01-01 00:00:09     0.5              0.7       0.8         
33  2018-01-01 00:00:16             2.3      3.6       4.9      5.0
34  2018-01-01 00:00:17     0.9     3.5      5.5                   
35  2018-01-01 00:00:23     0.1     0.6      0.0       1.7         
36  2018-01-01 00:00:29     2.7     5.5      4.3                   `

我用于执行该操作的代码是:

test = pandas.merge(df1, df2, on = ['datetime'], how= 'outer')

我不太确定如何解决此问题,如果能获得帮助,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

保留日期时间索引,并使用Reindex,如评论中提到的@Scott Boston,

i = pd.date_range(start='2018-01-01 00:00:00', end='2018-01-01 23:59:00', freq="1S")
ts = pd.DataFrame(index=i)

df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index('datetime').reindex(ts.index)


                    a   b   c   d   e
2018-01-01 00:00:00 NaN NaN NaN NaN NaN
2018-01-01 00:00:01 NaN NaN NaN NaN NaN
2018-01-01 00:00:02 NaN NaN NaN NaN NaN
2018-01-01 00:00:03 NaN NaN NaN NaN NaN
2018-01-01 00:00:04 0.9             
2018-01-01 00:00:05 NaN NaN NaN NaN NaN
2018-01-01 00:00:06     0.6 0.7     
2018-01-01 00:00:07 NaN NaN NaN NaN NaN
2018-01-01 00:00:08 NaN NaN NaN NaN NaN
2018-01-01 00:00:09 0.5     0.7 0.8 
2018-01-01 00:00:10 NaN NaN NaN NaN NaN
2018-01-01 00:00:11 NaN NaN NaN NaN NaN
2018-01-01 00:00:12 NaN NaN NaN NaN NaN
2018-01-01 00:00:13 NaN NaN NaN NaN NaN
2018-01-01 00:00:14 NaN NaN NaN NaN NaN
2018-01-01 00:00:15 NaN NaN NaN NaN NaN
2018-01-01 00:00:16     2.3 3.6 4.9 5.0
2018-01-01 00:00:17 0.9 3.5 5.5     

选项2:concat

pd.concat([ts, df.set_index('datetime')], axis = 1)