有两列,其中包含时间和状态值。当状态值从第一次出现的1更改为第一次出现的0时,我必须获取时间之间的差。
time status
01-07-2019 13:24 1
02-07-2019 04:02 1
02-07-2019 04:17 0
02-07-2019 04:21 1
02-07-2019 04:35 0
02-07-2019 04:36 1
我尝试了以下代码:
if (df1['status'] == 1)
df1['time_diff'] = df1['time'].sub(df1['time'], axis = 0)
print(df1)
答案 0 :(得分:2)
holder = ''
holding = False
for index, row in df.iterrows(): #Iterrating through each row of the dataframe
if row['status'] == 1:
if holding == True:
continue # Continue to next row if status is 1.
else
holder = row['time']
holding = True # Hold the first timestamp that the loop reads'
continue
elif row['status'] == 0
if holding == True:
print(row['time'] - holding) # Subtract the timestamp of the first occurence of 0 to the holded timestamp
break
else:
continue
如果希望每次将1更改为0时获得所有差异并将输出存储在列表中或所需的内容中,则可以对此循环进行一些更改。我只是这样做了,所以它计算从1到0的第一次出现。只需确保时间列的数据类型为datetime。
答案 1 :(得分:0)
首先让我们确保时间列为日期时间类型:
df['time'] = pd.to_datetime(df['time'])
diff = ['0 days 00:00:00']#first value,has nothing to compare
然后,我们可以使用zip同时访问索引和数据列:
for st,i in zip(df['status'],df.index):
if i > 0:#cannot evaluate previous from index 0
if df['status'][i] != df['status'][i-1]: #current row and previous row
#print(df['time'][i] - df['time'][i-1])
diff.append(df['time'][i] - df['time'][i-1])
else:
diff.append('0 days 00:00:00') #no change in status
现在,添加具有时间差的差异列表。请注意,我不知道您想如何处理“状态不变”的情况。 我认为这将是0时差。
df['dif_time'] = diff
print(df)
time status dif_time
0 2019-01-07 13:24:00 1 00:00:00
1 2019-02-07 04:02:00 1 00:00:00
2 2019-02-07 04:17:00 0 00:15:00
3 2019-02-07 04:21:00 1 00:04:00
4 2019-02-07 04:35:00 0 00:14:00
5 2019-02-07 04:36:00 1 00:01:00