基于使用熊猫条件的值的差异

时间:2019-11-12 05:42:27

标签: python pandas dataframe

有两列,其中包含时间和状态值。当状态值从第一次出现的1更改为第一次出现的0时,我必须获取时间之间的差。

time              status
01-07-2019 13:24    1
02-07-2019 04:02    1
02-07-2019 04:17    0
02-07-2019 04:21    1
02-07-2019 04:35    0
02-07-2019 04:36    1

我尝试了以下代码:

if (df1['status'] == 1)
     df1['time_diff'] = df1['time'].sub(df1['time'], axis = 0) 
     print(df1) 

2 个答案:

答案 0 :(得分:2)

holder = ''
holding = False

for index, row in df.iterrows(): #Iterrating through each row of the dataframe
    if row['status'] == 1:
        if holding == True:
            continue # Continue to next row if status is 1.
        else
            holder = row['time']
            holding = True # Hold the first timestamp that the loop reads'
            continue
    elif row['status']  == 0
        if holding == True:
            print(row['time'] - holding) # Subtract the timestamp of the first occurence of 0 to the holded timestamp
            break
        else:
            continue

如果希望每次将1更改为0时获得所有差异并将输出存储在列表中或所需的内容中,则可以对此循环进行一些更改。我只是这样做了,所以它计算从1到0的第一次出现。只需确保时间列的数据类型为datetime。

答案 1 :(得分:0)

首先让我们确保时间列为日期时间类型:

df['time'] = pd.to_datetime(df['time'])

diff = ['0 days 00:00:00']#first value,has nothing to compare

然后,我们可以使用zip同时访问索引和数据列:

for st,i in zip(df['status'],df.index):

    if i > 0:#cannot evaluate previous from index 0

        if df['status'][i] != df['status'][i-1]: #current row and previous row

            #print(df['time'][i] - df['time'][i-1])

            diff.append(df['time'][i] - df['time'][i-1])

        else:   

            diff.append('0 days 00:00:00') #no change in status

现在,添加具有时间差的差异列表。请注意,我不知道您想如何处理“状态不变”的情况。 我认为这将是0时差。

df['dif_time'] = diff   

print(df)

             time           status dif_time
0 2019-01-07 13:24:00       1       00:00:00
1 2019-02-07 04:02:00       1       00:00:00
2 2019-02-07 04:17:00       0       00:15:00
3 2019-02-07 04:21:00       1       00:04:00
4 2019-02-07 04:35:00       0       00:14:00
5 2019-02-07 04:36:00       1       00:01:00