Question

我有一个pandas数据框，其中包含带有日期时间和传感器值的多行。我的目标是添加一列来计算直到下次传感器值超过阈值的天数。

例如，对于数据<2019-01-05 11:00:00，200>，<2019-01-06 12:00:00，250>，<2019-01-07 13:00:00 ，300>对于200和250之间的阈值，对于250和300之间的阈值，[2天，1天，0天]，我希望附加列看起来像[1天，0天，0天]。

我尝试使用SmthTriats对数据帧进行二次采样，在两个数据帧上进行迭代，并根据df_sub = df[df[sensor_value] >= threshold]中的当前时间戳计算df_sub中的下一个时间戳。但是，这种解决方案似乎效率不高，我认为熊猫可能有一些优化的方式来计算我的需求。

在下面的示例代码中，我尝试了上面描述的内容。

df

预期输出（上述示例的输出）：

result

Answer 1

代替拆分数据框，您可以使用“ .loc”，该过滤器允许您以相同的方式过滤和遍历阈值：

df['RUL'] = '[2 days, 1 day, 0 days]'
for threshold in threshold_list:
    df.loc[df['sensor_value'] > <your_rule>,'RUL'] = '[1 day, 0 days, 0 days]'

此技术可避免拆分数据帧。

Answer 2

这就是我要做的一个阈值

def calc_rul(df, thresh):
    # we mark all the values greater than thresh
    markers =df.value.ge(thresh)

    # copy dates of the above row
    df['last_day'] = np.nan
    df.loc[markers, 'last_day'] = df.timestamp

    # back fill those dates 
    df['last_day'] = df['last_day'].bfill().astype('datetime64[ns]')

    df['RUL'] = (df.last_day - df.timestamp).dt.days

    # drop the columns if necessary,
    # remove this line to better see how the code works
    df.drop('last_day', axis=1, inplace=True)


calc_rul(df, 300)

有效计算熊猫的剩余使用寿命

2 个答案: