我正在使用Pandas数据框处理数据集。有两列timestamp
和pump_state
。后者是0或1。
现在,我想对列pump_state
进行迭代,并寻找嵌入在1中的零,如果最接近的1之间的时间间隔小于5分钟,则将它们更改为1。
例如,第52到55行。两个0夹在1中间。第一个0之前的1的时间戳是23:52,最后一个0之后的1的时间戳是23.56。这两个1的时间差小于5分钟。因此,0将需要更改为1。第65行的0也是如此。
我可以在timestamp和pump_state之间建立一个字典,然后遍历dict,并根据逻辑将0更改为1。然后使用新字典更新数据框。但是,有没有更好的方法(或更多的熊猫方法)呢?
答案 0 :(得分:0)
请考虑以下方法(遵循评论):
import numpy as np
import pandas as pd
# create sample data
NUM = 30
df = pd.DataFrame({
'timestamp': pd.date_range(start='5/29/2019 00:00:00',
periods=NUM, freq='1min'),
'pump_state': [1] * NUM})
df.loc[5:8, 'pump_state'] = 0 # 4 zeros - 4 minutes < 5 minutes
df.loc[15:25, 'pump_state'] = 0 # 10 zeros - 10 minutes > 5 minutes
# search for rows where 0 switches to 1 and vice versa
df['diff'] = df['pump_state'].diff()
df['diff_1'] = np.where(df['diff'] == 1, 1, -1)
df['diff_-1'] = np.where(df['diff'] == -1, 1, -2)
# merge all found swithces (like join in SQL)
df_support = pd.merge(
df, df, how='inner',
left_on='diff_1', right_on='diff_-1')[['timestamp_x', 'timestamp_y']]
# apply timing conditions to all pairs of switches
df_support = df_support[
# less than 5 minutes
(df_support['timestamp_x'] - df_support['timestamp_y'] < pd.Timedelta(minutes=5)) &
# greater than zero
(df_support['timestamp_x'] - df_support['timestamp_y'] > pd.Timedelta(0))]
# replace 0s with 1s where it is appropriate
for idx, row in df_support.iterrows():
df.loc[(df['timestamp'] >= row['timestamp_y']) &
(df['timestamp'] <= row['timestamp_x']),
'pump_state'] = 1
df.drop(columns=['diff', 'diff_1', 'diff_-1'], inplace=True)