熊猫按编号分组(而不是时间)

时间:2018-03-14 04:34:43

标签: python pandas pandas-groupby

在pd.Grouper中,我们可以按时间分组,例如使用10s

Time      Count
10:05:03   2
10:05:04   3
10:05:05   4
10:05:11   3
10:05:12   4

将提供以下结果:

Time  Count
10:05:10  9
10:05:20  7

我正在寻找相反的方式。我可以按时间对时间进行分组,例如,使用5

Count Time (s)
5    (4-3)=1s
5    (11-5)=6s
5    (12-11)=1s

非常感谢!

2 个答案:

答案 0 :(得分:0)

也许这就是你的想法。从大熊猫系列df开始:

2018-03-14 06:38:46.308425+00:00     2
2018-03-14 06:38:47.308425+00:00     3
2018-03-14 06:38:48.308425+00:00     4
2018-03-14 06:38:54.308425+00:00     3
2018-03-14 06:38:55.308425+00:00     4
dtype: int64

找出累积和超过5的倍数的索引:

df[:] = df.values.cumsum() // 5 * 5
hit5 = (df.diff() == 5).nonzero()[0]

在这种情况下,它是array([1, 3, 4])。然后迭代这些索引并采用前一个索引的差异:

for i in hit5:
    print(df.index[i] - df.index[i-1])

,并提供:

0 days 00:00:01
0 days 00:00:06
0 days 00:00:01

答案 1 :(得分:0)

如果我理解你的问题,你可以试试

import io
import numpy as np
import pandas as pd

df_txt = """
Time    Count
10:05:03    2
10:05:04    3
10:05:05    4
10:05:11    3
10:05:12    4"""

df = pd.read_csv(io.StringIO(df_txt), sep='\t')
df['Time'] = df.Time.apply(lambda x: pd.to_datetime(x))
df['CumCount'] = df.Count.cumsum()
df['Ind1'] = df.CumCount // 5
df['Ind2'] = df.Ind1.shift()
df['LagTime'] = df.Time.shift()
df.loc[df.Ind1 == df.Ind2, 'LagTime'] = np.nan
df['StartTime'] = df.LagTime.bfill()
out = df.groupby(['StartTime'], as_index=False).last()
out['Time (s)'] = out.Time.values - out.StartTime.values

输出:

print(out['Time (s)'])
# 0   00:00:01
# 1   00:00:06
# 2   00:00:01
# Name: Time (s), dtype: timedelta64[ns]