Python Pandas:在缺少的时间戳中将DateTimeIndex拆分为两个

时间:2017-01-07 03:56:35

标签: python pandas numpy

我有一个DateTimeIndex,如下所示:你可以看到时间戳是均匀分布的,除了从'2005-03-11 15:00:00''2005-03-13 17:30:00'的跳跃中间。

我如何以编程方式在缺失的时间戳点处拆分DateTimeIndex并返回2 DateTimeIndexs?

DateTimeIndex(['2005-03-11 11:00:00', '2005-03-11 11:30:00',
               '2005-03-11 12:00:00', '2005-03-11 12:30:00',
               '2005-03-11 13:00:00', '2005-03-11 13:30:00',
               '2005-03-11 14:00:00', '2005-03-11 14:30:00',
               '2005-03-11 15:00:00', '2005-03-13 17:00:00',
               '2005-03-13 17:30:00', '2005-03-13 18:00:00',
               '2005-03-13 18:30:00', '2005-03-13 19:00:00',
               '2005-03-13 19:30:00', '2005-03-13 20:00:00',
               '2005-03-13 20:30:00', '2005-03-13 21:00:00',
               '2005-03-13 21:30:00', '2005-03-13 22:00:00',
               '2005-03-13 22:30:00', '2005-03-13 23:00:00',
               '2005-03-13 23:30:00', '2005-03-14 00:00:00')]

4 个答案:

答案 0 :(得分:1)

您可以使用diff查找序列中缺少的元素,然后使用numpy.split拆分缺少的元素:

# get the time difference between each timestamp
time_diffs = data.to_series().diff()

# split at each break in the time data
new_data = np.split(data, np.where(time_diffs > np.median(time_diffs)))

编辑:使用显式循环代替numpy.split并使用numpy.diff代替pandas.series.diff的早期答案:

time_diffs = np.diff(data)
new_data = []
start_idx = 0

# loop once for each break in the data
for idx in np.where(time_diffs > np.median(time_diffs)):

    # build a new piece at each break in the data
    new_data.append(data[start_idx:idx+1])
    start_idx = idx+1

# add the last piece to the list
new_data.append(data[start_idx:])

以上可以作为数据运行:

import numpy as np
import pandas as pd

data = pd.DatetimeIndex([
    '2005-03-11 11:00:00', '2005-03-11 11:30:00',
    '2005-03-11 12:00:00', '2005-03-11 12:30:00',
    '2005-03-11 13:00:00', '2005-03-11 13:30:00',
    '2005-03-11 14:00:00', '2005-03-11 14:30:00',
    '2005-03-11 15:00:00', '2005-03-13 17:00:00',
    '2005-03-13 17:30:00', '2005-03-13 18:00:00',
    '2005-03-13 18:30:00', '2005-03-13 19:00:00',
    '2005-03-13 19:30:00', '2005-03-13 20:00:00',
    '2005-03-13 20:30:00', '2005-03-13 21:00:00',
    '2005-03-13 21:30:00', '2005-03-13 22:00:00',
    '2005-03-13 22:30:00', '2005-03-13 23:00:00',
    '2005-03-13 23:30:00', '2005-03-14 00:00:00'
])

答案 1 :(得分:1)

这应该有效。你也有一些语法错误。

times = pd.DatetimeIndex(['2005-03-11 11:00:00', '2005-03-11 11:30:00',
           '2005-03-11 12:00:00', '2005-03-11 12:30:00',
           '2005-03-11 13:00:00', '2005-03-11 13:30:00',
           '2005-03-11 14:00:00', '2005-03-11 14:30:00',
           '2005-03-11 15:00:00', '2005-03-13 17:00:00',
           '2005-03-13 17:30:00', '2005-03-13 18:00:00',
           '2005-03-13 18:30:00', '2005-03-13 19:00:00',
           '2005-03-13 19:30:00', '2005-03-13 20:00:00',
           '2005-03-13 20:30:00', '2005-03-13 21:00:00',
           '2005-03-13 21:30:00', '2005-03-13 22:00:00',
           '2005-03-13 22:30:00', '2005-03-13 23:00:00',
           '2005-03-13 23:30:00', '2005-03-14 00:00:00'])

early = pd.DatetimeIndex(times[:9])
late = pd.DatetimeIndex(times[9:])

如果您尝试拆分数据框,请尝试:

time_split = '2005-03-11 15:00:00'
early = df.ix[:time_split].index
late = df.ix[time_split:].index

答案 2 :(得分:1)

我假设差异是一致的,直到我们分裂为止。

port install libsdl2 <???>

答案 3 :(得分:0)

DateTimeIndex1=''
DateTimeIndex2=''
for i in DateTimeIndex:
 if '30:00' in i[0]:
  DateTimeIndex1+=i[0]
 else:
  DateTimeIndex2+=i[0]

试试上面的代码,希望有所帮助