我已将以下pandas数据框设置为从csv导入:
df = pd.read_csv('file_path',
parse_dates={'timestamp': ['Date','Time']},
index_col='timestamp',
usecols=['Date', 'Time', 'X'],)
因此它最终将日期时间作为索引和int64对象' X'为了价值。
我的数据看起来像两列:
X
timestamp
2015-08-25 16:52:10 95
2015-08-25 16:52:12 84
2015-08-25 16:52:14 86
2015-08-25 16:52:16 84
2015-08-25 16:52:18 85
2015-08-25 16:52:20 86
2015-08-25 16:52:22 84
2015-08-25 16:52:24 95
2015-08-25 16:52:28 95
2015-08-25 16:52:48 80
2015-08-25 16:52:50 85
2015-08-25 16:52:52 85
2015-08-25 16:52:54 84
2015-08-25 16:52:56 85
2015-08-25 16:52:58 86
2015-08-25 16:53:00 85
2015-08-25 16:53:02 85
2015-08-25 16:53:04 85
2015-08-25 16:53:06 86
2015-08-25 16:53:08 85
2015-08-25 16:53:10 85
然而,间隔并不总是一致的。有时我的数据点间隔超过两秒(即16:52:28-16:52:48)。
我想要的值是X = [84,86],但只有在至少连续10秒才会出现。
所以在我的数据框架中,我希望python只返回16:52:12-22和16:52:50-16:53:10的计数。
如何告诉python 16:52:50-16:53:10为2?我可以编码特定的时间间隔,但我如何翻译"至少Y连续秒"进入python?
提前致谢。
编辑:为了澄清,我的首选输出将是事件Y在样本集中出现的次数的计数。当X具有至少连续10秒的值时,发生事件Y.因此,例如,如果X在84-86至少连续10秒,那么我希望它是1的计数。
答案 0 :(得分:0)
我不确定你想要做什么,但我给你的答案至少是为了帮助澄清期望。
# Test data
df = pd.DataFrame([('2015-08-25 16:52:10', 95),
('2015-08-25 16:52:12', 84),
('2015-08-25 16:52:14', 86),
('2015-08-25 16:52:16', 84),
('2015-08-25 16:52:18', 85),
('2015-08-25 16:52:20', 86),
('2015-08-25 16:52:22', 84),
('2015-08-25 16:52:24', 95),
('2015-08-25 16:52:28', 95),
('2015-08-25 16:52:48', 80),
('2015-08-25 16:52:50', 85),
('2015-08-25 16:52:52', 85),
('2015-08-25 16:52:54', 84),
('2015-08-25 16:52:56', 85),
('2015-08-25 16:52:58', 86),
('2015-08-25 16:53:00', 85),
('2015-08-25 16:53:02', 85),
('2015-08-25 16:53:04', 85),
('2015-08-25 16:53:06', 86),
('2015-08-25 16:53:08', 85),
('2015-08-25 16:53:10', 85)],
columns=['timestamp', 'x'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
# Define a period column to indicate the period when the values occur
new = df.groupby(pd.TimeGrouper('10s'),as_index=False).apply(lambda x: x['x'])
df['period'] = new.index.get_level_values(0)
# Group by period and value and count the number of values to see the distinct values and how many time they occur by period
df = df.reset_index()
grouped = df.groupby(['period','x']).count()
print(grouped.head(10))
timestamp
period x
0 84 2
85 1
86 1
95 1
1 84 1
86 1
95 2
3 80 1
4 84 1
85 3