验证时间序列中的时间戳

时间:2017-01-05 13:35:29

标签: python time timestamp series verification

我正在处理时间序列数据,我想知道是否有效率&用于验证与该系列关联的时间戳序列是否有效的pythonic方法。换句话说,我想知道时间戳的顺序是否是正确的升序而没有丢失或重复的值。

我认为验证正确的顺序和重复值的存在应该相当简单,但我不太确定是否检测到丢失的时间戳。

1 个答案:

答案 0 :(得分:1)

numpy.diff可用于查找后续时间戳之间的差异。然后可以评估这些差异以确定时间戳是否看起来像预期

import numpy as np
import datetime as dt

def errant_timestamps(ts, expected_time_step=None, tolerance=0.02):
    # get the time delta between subsequent time stamps
    ts_diffs = np.array([tsd.total_seconds() for tsd in np.diff(ts)])

    # get the expected delta
    if expected_time_step is None:
        expected_time_step = np.median(ts_diffs)

    # find the index of timestamps that don't match the spacing of the rest
    ts_slow_idx = np.where(ts_diffs < expected_time_step * (1-tolerance))[0] + 1
    ts_fast_idx = np.where(ts_diffs > expected_time_step * (1+tolerance))[0] + 1

    # find the errant timestamps
    ts_slow = ts[ts_slow_idx]
    ts_fast = ts[ts_fast_idx]

    # if the timestamps appear valid, return None
    if len(ts_slow) == 0 and len(ts_fast) == 0:
        return None

    # return any errant timestamps
    return ts_slow, ts_fast


sample_timestamps = np.array(
    [dt.datetime.strptime(sts, "%d%b%Y %H:%M:%S") for sts in (
        "05Jan2017 12:45:00",
        "05Jan2017 12:50:00",
        "05Jan2017 12:55:00",
        "05Jan2017 13:05:00",
        "05Jan2017 13:10:00",
        "05Jan2017 13:00:00",
    )]
)

print errant_timestamps(sample_timestamps)