在pd.Grouper中,我们可以按时间分组,例如使用10s
Time Count
10:05:03 2
10:05:04 3
10:05:05 4
10:05:11 3
10:05:12 4
将提供以下结果:
Time Count
10:05:10 9
10:05:20 7
我正在寻找相反的方式。我可以按时间对时间进行分组,例如,使用5
Count Time (s)
5 (4-3)=1s
5 (11-5)=6s
5 (12-11)=1s
非常感谢!
答案 0 :(得分:0)
也许这就是你的想法。从大熊猫系列df
开始:
2018-03-14 06:38:46.308425+00:00 2
2018-03-14 06:38:47.308425+00:00 3
2018-03-14 06:38:48.308425+00:00 4
2018-03-14 06:38:54.308425+00:00 3
2018-03-14 06:38:55.308425+00:00 4
dtype: int64
找出累积和超过5的倍数的索引:
df[:] = df.values.cumsum() // 5 * 5
hit5 = (df.diff() == 5).nonzero()[0]
在这种情况下,它是array([1, 3, 4])
。然后迭代这些索引并采用前一个索引的差异:
for i in hit5:
print(df.index[i] - df.index[i-1])
,并提供:
0 days 00:00:01
0 days 00:00:06
0 days 00:00:01
答案 1 :(得分:0)
如果我理解你的问题,你可以试试
import io
import numpy as np
import pandas as pd
df_txt = """
Time Count
10:05:03 2
10:05:04 3
10:05:05 4
10:05:11 3
10:05:12 4"""
df = pd.read_csv(io.StringIO(df_txt), sep='\t')
df['Time'] = df.Time.apply(lambda x: pd.to_datetime(x))
df['CumCount'] = df.Count.cumsum()
df['Ind1'] = df.CumCount // 5
df['Ind2'] = df.Ind1.shift()
df['LagTime'] = df.Time.shift()
df.loc[df.Ind1 == df.Ind2, 'LagTime'] = np.nan
df['StartTime'] = df.LagTime.bfill()
out = df.groupby(['StartTime'], as_index=False).last()
out['Time (s)'] = out.Time.values - out.StartTime.values
输出:
print(out['Time (s)'])
# 0 00:00:01
# 1 00:00:06
# 2 00:00:01
# Name: Time (s), dtype: timedelta64[ns]