我的数据框如下所示。
索引是三个变量的组合:“日期”,“时间”和“状态”。
我有两个问题:
1)我可以将此索引转换为三个常规列吗?
2)我可以将“日期”和“时间”组合在一起作为新的时间序列索引吗?
in[19]: mydf
Out[19]:
Entry Exit
Date Time Status
01-01-13 03:22:00 REGULAR 57308825.0 43088232.0
07:22:00 REGULAR 57309553.0 43088773.0
11:22:00 REGULAR 57310969.0 43089369.0
15:22:00 REGULAR 57313168.0 43090433.0
19:22:00 REGULAR 57315755.0 43092329.0
23:22:00 REGULAR 57317073.0 43093590.0
01-02-13 03:22:00 REGULAR 57317475.0 43094076.0
07:22:00 REGULAR 57319975.0 43094558.0
07:37:45 DOOR OPEN 1524176.0 672377.0
07:37:46 DOOR CLOSE 1524176.0 672377.0
11:22:00 REGULAR 57326441.0 43097377.0
15:22:00 REGULAR 57331818.0 43100057.0
19:22:00 REGULAR 57337786.0 43103918.0
23:22:00 REGULAR 57339290.0 43105700.0
01-03-13 03:22:00 REGULAR 57339676.0 43106258.0
07:22:00 REGULAR 57342339.0 43106791.0
09:37:39 REGULAR 23317856.0 15286811.0
11:22:00 REGULAR 57349205.0 43109943.0
15:22:00 REGULAR 57354752.0 43112434.0
15:22:01 DOOR OPEN 4558.0 0.0
15:27:12 LOGON 4558.0 0.0
15:28:04 LGF-MAN 4558.0 0.0
15:28:07 DOOR CLOSE 4558.0 0.0
15:28:08 DOOR OPEN 4558.0 0.0
15:28:21 DOOR CLOSE 4558.0 0.0
18:25:09 DOOR OPEN 7171535.0 2420826.0
18:27:14 DOOR CLOSE 7171535.0 2420826.0
18:27:42 DOOR OPEN 7171535.0 2420826.0
18:27:44 DOOR CLOSE 7171535.0 2420826.0
18:27:45 DOOR OPEN 7171535.0 2420826.0
答案 0 :(得分:2)
来源DF:
In [46]: x
Out[46]:
a b
(01-01-13, 03:22:00, aaa) 0.773264 0.074545
(01-02-13, 03:22:00, bbb) 0.070720 0.806460
(01-03-13, 13:33:00, ccc) 0.645056 0.374402
我可以将此索引转换为三个常规列吗?
In [50]: x[['i1','i2','i3']] = pd.DataFrame(x.index.values.tolist()).values
In [51]: x
Out[51]:
a b i1 i2 i3
(01-01-13, 03:22:00, aaa) 0.773264 0.074545 01-01-13 03:22:00 aaa
(01-02-13, 03:22:00, bbb) 0.070720 0.806460 01-02-13 03:22:00 bbb
(01-03-13, 13:33:00, ccc) 0.645056 0.374402 01-03-13 13:33:00 ccc
我可以结合"日期"和"时间"一起作为一个新的时间序列索引?
In [53]: x = x.set_index(pd.to_datetime(x.i1 + ' ' + x.i2, errors='coerce'))
In [54]: x
Out[54]:
a b i1 i2 i3
2013-01-01 03:22:00 0.773264 0.074545 01-01-13 03:22:00 aaa
2013-01-02 03:22:00 0.070720 0.806460 01-02-13 03:22:00 bbb
2013-01-03 13:33:00 0.645056 0.374402 01-03-13 13:33:00 ccc
您也可以直接从索引值执行此操作:
In [56]: x
Out[56]:
a b
(01-01-13, 03:22:00, aaa) 0.735934 0.488669
(01-02-13, 03:22:00, bbb) 0.214214 0.234158
(01-03-13, 13:33:00, ccc) 0.071282 0.747958
In [57]: x = x.set_index(pd.to_datetime(x.index.str[0] + ' ' + x.index.str[1], errors='coerce'))
In [58]: x
Out[58]:
a b
2013-01-01 03:22:00 0.735934 0.488669
2013-01-02 03:22:00 0.214214 0.234158
2013-01-03 13:33:00 0.071282 0.747958
答案 1 :(得分:1)
我们可以从:
开始>>> my_df = pd.DataFrame({'Entry': [5.73], 'Exit': [4.31]},
index=[('01-01-13', '03:22:00', 'REGULAR')])
>>> my_df
Entry Exit
(01-01-13, 03:22:00, REGULAR) 5.73 4.31
1)要转换为3列:
>>> combined = pd.concat([my_df.reset_index()['index'].apply(pd.Series),
my_df.reset_index(drop=True)],
axis=1)
0 1 2 Entry Exit
0 01-01-13 03:22:00 REGULAR 5.73 4.31
现在您将有3个名为0,1和2的新列。
2)结合日期和时间来获取DatetimeIndex:
>>> combined.set_index(pd.to_datetime(combined[0] + ' ' + combined[1]))
0 1 2 Entry Exit
2013-01-01 03:22:00 01-01-13 03:22:00 REGULAR 5.73 4.31
现在,您可以删除第0列和第1列,并根据需要重命名第2列。