熊猫时间序列:满足特定条件的总时长

时间:2019-11-12 10:44:18

标签: pandas

我有一个时间序列

ts = pd.Series(data=[0,1,2,3,4],index=[pd.Timestamp('1991-01-01'),pd.Timestamp('1995-01-01'),pd.Timestamp('1996-01-01'),pd.Timestamp('2010-01-01'),pd.Timestamp('2011-01-01')])

以最快,最易读的方式获取小于2的总持续时间的方法是什么,前提是该值在下一个时间步之前表示有效(否则没有线性插值)。我想可能有一个熊猫函数

1 个答案:

答案 0 :(得分:0)

这似乎工作得很好,但是我仍然感到困惑,似乎没有为此提供熊猫功能!

import pandas as pd
import numpy as np

ts = pd.Series(data=[0,1,2,3,4],index=[pd.Timestamp('1991-01-01'),pd.Timestamp('1995-01-01'),pd.Timestamp('1996-01-01'),pd.Timestamp('2010-01-01'),pd.Timestamp('2011-01-01')])

# making the timeseries binary. 1 = meets condition, 0 = does not
ts = ts.where(ts>=2,other=1)
ts = ts.where(ts<2,other=0)

delta_time = ts.index.to_pydatetime()[1:]-ts.index.to_pydatetime()[:-1]

time_below_2 = np.sum(delta_time[np.invert(ts.values[:-1])]).total_seconds()
time_above_2 = np.sum(delta_time[(ts.values[:-1])]).total_seconds()

上述功能似乎在某些时间范围内中断。这个选项比较慢,但是在我的任何测试中都没有失败:

def get_total_duration_above_and_below_value(value,ts):

    # making the timeseries binary. 1 = above value, 0 = below value
    ts = ts.where(ts >= value, other=1)
    ts = ts.where(ts < value, other=0)

    time_above_value = 0
    time_below_value = 0
    for i in range(ts.size - 1):
        if ts[i] == 1:
            time_above_value += abs(pd.Timedelta(
                ts.index[i] - ts.index[i + 1]).total_seconds()) / 3600
        else:
            time_below_value += abs(pd.Timedelta(
                ts.index[i] - ts.index[i + 1]).total_seconds()) / 3600

    return time_above_value, time_below_value