我有以下数据框:
df = pd.DataFrame({'Day' : ['15', '15', '15', '16', '16', '17', '17', '17', '17'],
'Month' : ['10', '10', '10', '10', '10', '10', '10', '10', '10'],
'Year' : ['2019', '2019', '2019', '2019', '2019', '2019', '2019', '2019',
'2019'],
'Hour' : ['14', '14', '14', '14', '14', '14', '15', '15', '15'],
'Minute' : ['33', '41', '45', '46', '58', '59', '01', '02', '03' ],
'Second' : ['16', '17', '19', '19', '20', '0', '0', '0', '0'],
'depth' : [40000, 39000, 13000, 40000, 39500, 35000, 34500, 35000, 34600]
})
我使用以下行来创建新的日期列:
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day', 'Hour', 'Minute', 'Second']])
时间差有限制,深度之间的差也有限制。因此,我实现了以下代码:
df['Status'] = np.NaN
for i in range(0, len(df)):
for j in range(i+1, len(df)):
date_init = pd.to_datetime(df['Date'].iloc[i])
date_next = pd.to_datetime(df['Date'].iloc[j])
if(abs(date_init - date_next) < pd.to_timedelta('0 days 00:10:00')): # 10 minutes
#Calculate the depth variation
var_delta_sensor = abs(df['depth'].iloc[i] - df['depth'].iloc[j])
if(var_delta_sensor < 1500):
#The depth is valid let's accept
df['Status'].iloc[i] = 'ACCEPT'
df['Status'].iloc[j] = 'ACCEPT'
else:
#Entering here means that the depth is not valid
print("NOT depth")
else:
#The difference between element i and j is greater than 10 minutes
i = j
j = i + 1
输出如下图所示。这是正确的。正是我需要的输出,但是我正在使用两个FOR,这非常慢。我需要在20000行数据帧上运行。 我想学习一种新的,更快的方法来达到相同的结果。 Tks
print(df)
Day Month Year Hour Minute Second depth Date Status
15 10 2019 14 33 16 40000 2019-10-15 14:33:16 ACCEPT
15 10 2019 14 41 17 39000 2019-10-15 14:41:17 ACCEPT
15 10 2019 14 45 19 13000 2019-10-15 14:45:19 NaN
16 10 2019 14 46 19 40000 2019-10-16 14:46:19 NaN
16 10 2019 14 58 20 39500 2019-10-16 14:58:20 NaN
17 10 2019 14 59 0 35000 2019-10-17 14:59:00 ACCEPT
17 10 2019 15 01 0 34500 2019-10-17 15:01:00 ACCEPT
17 10 2019 15 02 0 35000 2019-10-17 15:02:00 ACCEPT
17 10 2019 15 03 0 34600 2019-10-17 15:03:00 ACCEPT
答案 0 :(得分:1)
在代码中没有大的改变,如果仅将àl'une与下一个进行比较,则可以更改嵌套循环:
对于范围在(0,len(df))中的i:
对于范围为(i + 1,len(df))的j:
通过:
for i in range(0, len(df)-1):
j=i+1
这将使您节省很多事。
另一种解决方案是使用diff函数来计算var_time和var_sensor。 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html
例如:
import pandas as pd
import numpy as np
import datetime
df['var_time'] =df['Date'].diff()
df['var_delta_sensor'] =df['depth'].diff().abs()
time_delta=datetime.timedelta(minutes=10) #10 minutes
df['status'] = np.where((( df['var_time'] < time_delta) & (df['var_delta_sensor'] <1500)), 'Accept', np.NaN)