如何基于熊猫同一列上的相邻值更改值

时间:2019-06-06 21:29:58

标签: python pandas dataframe time-series

我正在使用Pandas数据框处理数据集。有两列timestamppump_state。后者是0或1。

enter image description here

现在,我想对列pump_state进行迭代,并寻找嵌入在1中的零,如果最接近的1之间的时间间隔小于5分钟,则将它们更改为1。

例如,第52到55行。两个0夹在1中间。第一个0之前的1的时间戳是23:52,最后一个0之后的1的时间戳是23.56。这两个1的时间差小于5分钟。因此,0将需要更改为1。第65行的0也是如此。

我可以在timestamp和pump_state之间建立一个字典,然后遍历dict,并根据逻辑将0更改为1。然后使用新字典更新数据框。但是,有没有更好的方法(或更多的熊猫方法)呢?

1 个答案:

答案 0 :(得分:0)

请考虑以下方法(遵循评论):

import numpy as np
import pandas as pd

# create sample data
NUM = 30
df = pd.DataFrame({
    'timestamp': pd.date_range(start='5/29/2019 00:00:00',
                               periods=NUM, freq='1min'),
    'pump_state': [1] * NUM})
df.loc[5:8, 'pump_state'] = 0  # 4 zeros - 4 minutes < 5 minutes
df.loc[15:25, 'pump_state'] = 0  # 10 zeros - 10 minutes > 5 minutes

# search for rows where 0 switches to 1 and vice versa
df['diff'] = df['pump_state'].diff()
df['diff_1'] = np.where(df['diff'] == 1, 1, -1)
df['diff_-1'] = np.where(df['diff'] == -1, 1, -2)

# merge all found swithces (like join in SQL)
df_support = pd.merge(
    df, df, how='inner',
    left_on='diff_1', right_on='diff_-1')[['timestamp_x', 'timestamp_y']]

# apply timing conditions to all pairs of switches
df_support = df_support[
    # less than 5 minutes
    (df_support['timestamp_x'] - df_support['timestamp_y'] < pd.Timedelta(minutes=5)) &
    # greater than zero
    (df_support['timestamp_x'] - df_support['timestamp_y'] > pd.Timedelta(0))]

# replace 0s with 1s where it is appropriate
for idx, row in df_support.iterrows():
    df.loc[(df['timestamp'] >= row['timestamp_y']) &
           (df['timestamp'] <= row['timestamp_x']),
           'pump_state'] = 1

df.drop(columns=['diff', 'diff_1', 'diff_-1'], inplace=True)