Pandas Datetime Interval Resample to Seconds

时间:2018-01-29 15:46:12

标签: python pandas

鉴于以下数据框:

import pandas as pd

pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"],
         "end":    ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"],
         "status": ["OK", "ERROR", "OK"]})

HAVE:

| start               | end                 | status |
|---------------------|---------------------|--------|
| 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK     |
| 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR  | 
| 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK     |

我想将其转换为另一种格式,即"展开"间隔并将它们转换为DatetimeIndex,并重新采样数据。结果应如下所示:

想要:

|                     | status    |
|---------------------|-----------|
| 2017-01-01 13:09:01 | OK        |
| 2017-01-01 13:09:02 | OK        |
| 2017-01-01 13:09:03 | OK        |
| 2017-01-01 13:09:04 | OK        |
| 2017-01-01 13:09:05 | OK        |
| 2017-01-01 13:09:06 | NAN       |
| 2017-01-01 13:09:07 | ERROR     |
| 2017-01-01 13:09:08 | ERROR     |
| 2017-01-01 13:09:09 | ERROR     |
| 2017-01-01 13:09:10 | NAN       |
| 2017-01-01 13:09:11 | NAN       |
| 2017-01-01 13:09:12 | OK        |
| 2017-01-01 13:09:13 | OK        |
| 2017-01-01 13:09:14 | OK        |

非常感谢任何帮助!

2 个答案:

答案 0 :(得分:3)

使用IntervalIndex

# create an IntervalIndex from start/end
iv_idx = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')

# generate the desired index of individual times
new_idx = pd.date_range(df['start'].min(), df['end'].max(), freq='s')

# set the index of 'status' as the IntervalIndex, then reindex to the new index
result = df['status'].set_axis(iv_idx, inplace=False).reindex(new_idx)

result的结果输出:

2017-01-01 13:09:01       OK
2017-01-01 13:09:02       OK
2017-01-01 13:09:03       OK
2017-01-01 13:09:04       OK
2017-01-01 13:09:05       OK
2017-01-01 13:09:06      NaN
2017-01-01 13:09:07    ERROR
2017-01-01 13:09:08    ERROR
2017-01-01 13:09:09    ERROR
2017-01-01 13:09:10      NaN
2017-01-01 13:09:11      NaN
2017-01-01 13:09:12       OK
2017-01-01 13:09:13       OK
2017-01-01 13:09:14       OK
Freq: S, Name: status, dtype: object

答案 1 :(得分:1)

让我们尝试一下,使用apply并使用date_range重建系列,然后使用resample来填充asfreq填充NaN的缺失时间:

df.apply(lambda x: pd.Series(index=pd.date_range(x['start'], 
                                                 x['end'],
                                                 freq='S'), 
                             data=x['status']), axis=1)\
  .T\
  .stack().reset_index(level=1, drop=True)\
  .resample('S').asfreq()

输出:

2017-01-01 13:09:01       OK
2017-01-01 13:09:02       OK
2017-01-01 13:09:03       OK
2017-01-01 13:09:04       OK
2017-01-01 13:09:05       OK
2017-01-01 13:09:06      NaN
2017-01-01 13:09:07    ERROR
2017-01-01 13:09:08    ERROR
2017-01-01 13:09:09    ERROR
2017-01-01 13:09:10      NaN
2017-01-01 13:09:11      NaN
2017-01-01 13:09:12       OK
2017-01-01 13:09:13       OK
2017-01-01 13:09:14       OK
Freq: S, dtype: object