检查最近n分钟内是否出现值

时间:2017-03-29 09:47:59

标签: python pandas datetime

如果我们有以下数据:

timestamp = ['2016-01-09_14-49-18','2016-01-10_09-48-59','2016-01-10_09-50-29','2016-01-10_09-59-08','2016-01-10_10-33-01','2016-01-10_10-35-01','2016-01-10_10-39-05','2016-01-10_10-40-38','2016-01-10_10-50-55','2016-01-10_12-28-35','2016-01-10_15-13-34','2016-01-10_17-02-44','2016-01-10_17-05-48','2016-01-10_17-13-44','2016-01-10_17-15-52']
feature = ['A','A','B','C','B','C','C','A','A','A','B','A','C','C','A']
df = pd.DataFrame({'timestamp':timestamp, 'feature':feature})

如何为每个功能创建一个新列,表示该类是否出现,比方说,过去15分钟?

结果:

   feature    timestamp              A    B    C
0       A  2016-01-09_14-49-18       1    0    0
1       A  2016-01-10_09-48-59       1    0    0
2       B  2016-01-10_09-50-29       1    1    0
3       C  2016-01-10_09-59-08       1    1    1
4       B  2016-01-10_10-33-01       0    1    0
5       C  2016-01-10_10-35-01       0    1    1
6       C  2016-01-10_10-39-05       0    1    1
7       A  2016-01-10_10-40-38       1    1    1
8       A  2016-01-10_10-50-55       1    0    1
9       A  2016-01-10_12-28-35       1    0    0
10      B  2016-01-10_15-13-34       0    1    0
11      A  2016-01-10_17-02-44       1    0    0
12      C  2016-01-10_17-05-48       1    0    1
13      C  2016-01-10_17-13-44       1    0    1
14      A  2016-01-10_17-15-52       1    0    1

其中1 =班级在最后15分钟出现,0 =班级没有出现。

3 个答案:

答案 0 :(得分:1)

以下是其中一个功能的示例。您可以对循环中的其他列/功能重复此操作:

df = df.set_index(pd.to_datetime(df['timestamp'], format='%Y-%m-%d_%H-%M-%S'))
df['A'] = df.index
df['A'].loc[df['feature'] != 'A'] = np.NaN
df['A'] = df['A'].ffill()
df['A'] = df.index - df['A']
df['A'] = df['A'] < pd.to_timedelta('15m')

这导致以下数据框:

                    feature            timestamp      A
timestamp                                              
2016-01-09 14:49:18       A  2016-01-09_14-49-18   True
2016-01-10 09:48:59       A  2016-01-10_09-48-59   True
2016-01-10 09:50:29       B  2016-01-10_09-50-29   True
2016-01-10 09:59:08       C  2016-01-10_09-59-08   True
2016-01-10 10:33:01       B  2016-01-10_10-33-01  False
2016-01-10 10:35:01       C  2016-01-10_10-35-01  False
2016-01-10 10:39:05       C  2016-01-10_10-39-05  False
2016-01-10 10:40:38       A  2016-01-10_10-40-38   True
2016-01-10 10:50:55       A  2016-01-10_10-50-55   True
2016-01-10 12:28:35       A  2016-01-10_12-28-35   True
2016-01-10 15:13:34       B  2016-01-10_15-13-34  False
2016-01-10 17:02:44       A  2016-01-10_17-02-44   True
2016-01-10 17:05:48       C  2016-01-10_17-05-48   True
2016-01-10 17:13:44       C  2016-01-10_17-13-44   True
2016-01-10 17:15:52       A  2016-01-10_17-15-52   True

如果您想要0和1而不是bool,请在列上使用astype(int)

答案 1 :(得分:1)

from datetime import timedelta, datetime
# prepare cols
df["A"] = 0
df["B"] = 0
df["C"] = 0

# convert to datetime
df["timestamp"] = pd.to_datetime(df["timestamp"],format="%Y-%m-%d_%H-%M-%S")

feature_list = ["A","B","C"]
for row in df.iterrows():
    curr_index = row[0]
    curr_time = row[1][1]
    temp_df = df.loc[(df.timestamp <= curr_time)&(df.timestamp > curr_time-timedelta(minutes=15))]
    for feature_i in feature_list:
        if feature_i in temp_df.feature.tolist():
            df.loc[curr_index, feature_i] = 1
        else:
            df.loc[curr_index, feature_i] = 0

输出:

   feature           timestamp  A  B  C
0        A 2016-01-09 14:49:18  1  0  0
1        A 2016-01-10 09:48:59  1  0  0
2        B 2016-01-10 09:50:29  1  1  0
3        C 2016-01-10 09:59:08  1  1  1
4        B 2016-01-10 10:33:01  0  1  0
5        C 2016-01-10 10:35:01  0  1  1
6        C 2016-01-10 10:39:05  0  1  1
7        A 2016-01-10 10:40:38  1  1  1
8        A 2016-01-10 10:50:55  1  0  1
9        A 2016-01-10 12:28:35  1  0  0
10       B 2016-01-10 15:13:34  0  1  0
11       A 2016-01-10 17:02:44  1  0  0
12       C 2016-01-10 17:05:48  1  0  1
13       C 2016-01-10 17:13:44  1  0  1
14       A 2016-01-10 17:15:52  1  0  1

答案 2 :(得分:1)

您可以使用:

df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d_%H-%M-%S')

for col in df['feature'].unique():
    df[col] = df['timestamp'] - df['timestamp'].where(df['feature'] == col).ffill()
    df[col] = (df[col] < pd.to_timedelta('15min')).astype(int)
print (df)

   feature           timestamp  A  B  C
0        A 2016-01-09 14:49:18  1  0  0
1        A 2016-01-10 09:48:59  1  0  0
2        B 2016-01-10 09:50:29  1  1  0
3        C 2016-01-10 09:59:08  1  1  1
4        B 2016-01-10 10:33:01  0  1  0
5        C 2016-01-10 10:35:01  0  1  1
6        C 2016-01-10 10:39:05  0  1  1
7        A 2016-01-10 10:40:38  1  1  1
8        A 2016-01-10 10:50:55  1  0  1
9        A 2016-01-10 12:28:35  1  0  0
10       B 2016-01-10 15:13:34  0  1  0
11       A 2016-01-10 17:02:44  1  0  0
12       C 2016-01-10 17:05:48  1  0  1
13       C 2016-01-10 17:13:44  1  0  1
14       A 2016-01-10 17:15:52  1  0  1