熊猫组的时间大于N分钟

时间:2017-01-05 13:58:58

标签: python pandas

我有一个数据框,其中包含时间标签列,卫星ID和网站ID。我的目标是将数据集分解为单个"轨道"每个"跟踪"是卫星和站点ID的独特组合。我可以使用标准pandas groupby功能并指定by=['site', 'sat']轻松完成此操作。但另一个警告是,如果一个组内的时间间隔超过N分钟,那么时间间隔之后的数据应该成为一个新的跟踪"。

我的问题是计算我的(站点,坐标)组中连续行之间的时间点的最佳方法,确定timedelta何时大于N分钟,并创建一个新的组/轨道?

我想我可以使用diff()方法计算行之间的时间差。理想情况下,有一种方法可以将第三个键添加到我的groupby调用中,该调用封装了我使用的时间限制。

以下是一些示例代码,用于生成测试数据集并执行初始站点,分组。

import pandas as pd
import numpy as np

# Create first sample set.

N=10
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
A_data = np.arange(0, N)
A_site = ['X'] * N
A_sat = 12345

# Create second sample set over the same time span but with a different sat

N=5
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
B_data = np.arange(0, N)
B_site = ['X'] * N
B_sat = 3456


# Create a third sample set with a new site over the same time span but the
# same sat as the second set

N = 10
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s')
C_data = np.arange(0, N)
C_site = ['Y'] * N
C_sat = 3456

# Create a fourth sample set with the same sat and site as the third set but
# more than 20 minutes after the third set.

N = 5
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s')
D_data = np.arange(0, N)
D_site = ['Y'] * N
D_sat = 3456

# Build a data frame for each sample set

A = pd.DataFrame(index=A_times, data={'data': A_data, 'site' : A_site, 'sat' : A_sat})
B = pd.DataFrame(index=B_times, data={'data': B_data, 'site' : B_site, 'sat' : B_sat})
C = pd.DataFrame(index=C_times, data={'data': C_data, 'site' : C_site, 'sat' : C_sat})
D = pd.DataFrame(index=D_times, data={'data': D_data, 'site' : D_site, 'sat' : D_sat})

# mash them into one larger test data frame

test = pd.concat([A, B, C, D])

print(test)

my_groups = test.groupby(by = ['site', 'sat'])

for key, g in my_groups:
    print(key)
    print(g)

这是

的输出

test =

                     data    sat site
2016-01-01 00:00:00     0  12345    X
2016-01-01 00:00:01     1  12345    X
2016-01-01 00:00:02     2  12345    X
2016-01-01 00:00:03     3  12345    X
2016-01-01 00:00:04     4  12345    X
2016-01-01 00:00:05     5  12345    X
2016-01-01 00:00:06     6  12345    X
2016-01-01 00:00:07     7  12345    X
2016-01-01 00:00:08     8  12345    X
2016-01-01 00:00:09     9  12345    X
2016-01-01 00:00:00     0   3456    X
2016-01-01 00:00:01     1   3456    X
2016-01-01 00:00:02     2   3456    X
2016-01-01 00:00:03     3   3456    X
2016-01-01 00:00:04     4   3456    X
2016-01-01 00:01:00     0   3456    Y
2016-01-01 00:01:01     1   3456    Y
2016-01-01 00:01:02     2   3456    Y
2016-01-01 00:01:03     3   3456    Y
2016-01-01 00:01:04     4   3456    Y
2016-01-01 00:01:05     5   3456    Y
2016-01-01 00:01:06     6   3456    Y
2016-01-01 00:01:07     7   3456    Y
2016-01-01 00:01:08     8   3456    Y
2016-01-01 00:01:09     9   3456    Y
2016-01-01 01:00:00     0   3456    Y
2016-01-01 01:00:01     1   3456    Y
2016-01-01 01:00:02     2   3456    Y
2016-01-01 01:00:03     3   3456    Y
2016-01-01 01:00:04     4   3456    Y

和各个小组

('X', 3456)
                     data   sat site
2016-01-01 00:00:00     0  3456    X
2016-01-01 00:00:01     1  3456    X
2016-01-01 00:00:02     2  3456    X
2016-01-01 00:00:03     3  3456    X
2016-01-01 00:00:04     4  3456    X
('X', 12345)
                     data    sat site
2016-01-01 00:00:00     0  12345    X
2016-01-01 00:00:01     1  12345    X
2016-01-01 00:00:02     2  12345    X
2016-01-01 00:00:03     3  12345    X
2016-01-01 00:00:04     4  12345    X
2016-01-01 00:00:05     5  12345    X
2016-01-01 00:00:06     6  12345    X
2016-01-01 00:00:07     7  12345    X
2016-01-01 00:00:08     8  12345    X
2016-01-01 00:00:09     9  12345    X
('Y', 3456)
                     data   sat site
2016-01-01 00:01:00     0  3456    Y
2016-01-01 00:01:01     1  3456    Y
2016-01-01 00:01:02     2  3456    Y
2016-01-01 00:01:03     3  3456    Y
2016-01-01 00:01:04     4  3456    Y
2016-01-01 00:01:05     5  3456    Y
2016-01-01 00:01:06     6  3456    Y
2016-01-01 00:01:07     7  3456    Y
2016-01-01 00:01:08     8  3456    Y
2016-01-01 00:01:09     9  3456    Y
2016-01-01 01:00:00     0  3456    Y
2016-01-01 01:00:01     1  3456    Y
2016-01-01 01:00:02     2  3456    Y
2016-01-01 01:00:03     3  3456    Y
2016-01-01 01:00:04     4  3456    Y

理想的行为是,由于数据中存在20分钟的差距,上面的第三组实际上应该分成两组,例如

('Y', 3456)
                     data   sat site
2016-01-01 00:01:00     0  3456    Y
2016-01-01 00:01:01     1  3456    Y
2016-01-01 00:01:02     2  3456    Y
2016-01-01 00:01:03     3  3456    Y
2016-01-01 00:01:04     4  3456    Y
2016-01-01 00:01:05     5  3456    Y
2016-01-01 00:01:06     6  3456    Y
2016-01-01 00:01:07     7  3456    Y
2016-01-01 00:01:08     8  3456    Y
2016-01-01 00:01:09     9  3456    Y

New group here

2016-01-01 01:00:00     0  3456    Y
2016-01-01 01:00:01     1  3456    Y
2016-01-01 01:00:02     2  3456    Y
2016-01-01 01:00:03     3  3456    Y
2016-01-01 01:00:04     4  3456    Y

任何建议将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:1)

在摆弄unutbu的解决方案并将其与this post的建议相结合后,我能够解决问题。更完整的示例测试集和解决方案如下所示。

# In[222]:

import pandas as pd
import numpy as np


# In[223]:

# Create 1st sample set.

N=10
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
A_data = np.arange(0, N)
A_site = ['X'] * N
A_sat = 12345

# Create 2nd sample set over the same time span but with a different sat

N=5
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s')
B_data = np.arange(0, N)
B_site = ['X'] * N
B_sat = 3456


# Create a 3rd sample set with a new site over the same time span but the
# same sat as the 2nd set

N = 10
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s')
C_data = np.arange(0, N)
C_site = ['Y'] * N
C_sat = 3456

# Create a 4th sample set with the same sat and site as the 3rd set but
# more than 20 minutes after the third set.

N = 5
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s')
D_data = np.arange(0, N)
D_site = ['Y'] * N
D_sat = 3456

# Create a 5th sample set with the same sat and site as the 3rd set but
# more than 20 minutes after the third set.

N = 60
E_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='60s')
E_data = np.arange(0, N)
E_site = ['Z'] * N
E_sat = 3456

# Create a 6th sample set with the same sat and site as the 4th set but
# more than 20 minutes after the fourth set.

N = 5
F_times = pd.date_range('2016-01-02T00:00:00', periods=N, freq='60s')
F_data = np.arange(0, N)
F_site = ['Y'] * N
F_sat = 3456


# In[224]:

# Build a data frame for each sample set

A = pd.DataFrame(data={'time': A_times, 'data': A_data, 'site' : A_site, 'sat' : A_sat})
B = pd.DataFrame(data={'time': B_times, 'data': B_data, 'site' : B_site, 'sat' : B_sat})
C = pd.DataFrame(data={'time': C_times, 'data': C_data, 'site' : C_site, 'sat' : C_sat})
D = pd.DataFrame(data={'time': D_times, 'data': D_data, 'site' : D_site, 'sat' : D_sat})
E = pd.DataFrame(data={'time': E_times, 'data': E_data, 'site' : E_site, 'sat' : E_sat})
F = pd.DataFrame(data={'time': F_times, 'data': F_data, 'site' : F_site, 'sat' : F_sat})

# mash them into one larger test data frame

test = pd.concat([A, B, C, D, E, F])


# In[225]:

print(test)


# In[226]:

test.sort_values(['time'], inplace=True)


# In[227]:

# This approach doesn't quite work. Note that the group (Y, 3456, 0) really 
# has 2 tracks in it because the overlapping track from (Z, 3456, 0) is screwing up
# the delta-t calculation and hiding the fact that within the Y, 3456 group there
# was a large time gap.

test1 = test.copy()
test1['delta_t'] = test1['time'].diff()
test1['track'] = (test1['delta_t'] > pd.Timedelta(minutes=20)).cumsum()

my_groups = test1.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
    print(key)
    print(g)

这个输出看起来像这样:

('X', 3456, 0)
   data   sat site                time  delta_t  track
0     0  3456    X 2016-01-01 00:00:00   0 days      0
1     1  3456    X 2016-01-01 00:00:01   0 days      0
2     2  3456    X 2016-01-01 00:00:02   0 days      0
3     3  3456    X 2016-01-01 00:00:03   0 days      0
4     4  3456    X 2016-01-01 00:00:04   0 days      0
('X', 12345, 0)
   data    sat site                time  delta_t  track
0     0  12345    X 2016-01-01 00:00:00      NaT      0
1     1  12345    X 2016-01-01 00:00:01 00:00:01      0
2     2  12345    X 2016-01-01 00:00:02 00:00:01      0
3     3  12345    X 2016-01-01 00:00:03 00:00:01      0
4     4  12345    X 2016-01-01 00:00:04 00:00:01      0
5     5  12345    X 2016-01-01 00:00:05 00:00:01      0
6     6  12345    X 2016-01-01 00:00:06 00:00:01      0
7     7  12345    X 2016-01-01 00:00:07 00:00:01      0
8     8  12345    X 2016-01-01 00:00:08 00:00:01      0
9     9  12345    X 2016-01-01 00:00:09 00:00:01      0
('Y', 3456, 0)
   data   sat site                time  delta_t  track
0     0  3456    Y 2016-01-01 00:01:00 00:00:00      0
1     1  3456    Y 2016-01-01 00:01:01 00:00:01      0
2     2  3456    Y 2016-01-01 00:01:02 00:00:01      0
3     3  3456    Y 2016-01-01 00:01:03 00:00:01      0
4     4  3456    Y 2016-01-01 00:01:04 00:00:01      0
5     5  3456    Y 2016-01-01 00:01:05 00:00:01      0
6     6  3456    Y 2016-01-01 00:01:06 00:00:01      0
7     7  3456    Y 2016-01-01 00:01:07 00:00:01      0
8     8  3456    Y 2016-01-01 00:01:08 00:00:01      0
9     9  3456    Y 2016-01-01 00:01:09 00:00:01      0
0     0  3456    Y 2016-01-01 01:00:00 00:01:00      0
1     1  3456    Y 2016-01-01 01:00:01 00:00:01      0
2     2  3456    Y 2016-01-01 01:00:02 00:00:01      0
3     3  3456    Y 2016-01-01 01:00:03 00:00:01      0
4     4  3456    Y 2016-01-01 01:00:04 00:00:01      0
('Y', 3456, 1)
   data   sat site                time  delta_t  track
0     0  3456    Y 2016-01-02 00:00:00 22:59:56      1
1     1  3456    Y 2016-01-02 00:01:00 00:01:00      1
2     2  3456    Y 2016-01-02 00:02:00 00:01:00      1
3     3  3456    Y 2016-01-02 00:03:00 00:01:00      1
4     4  3456    Y 2016-01-02 00:04:00 00:01:00      1
('Z', 3456, 0)
    data   sat site                time  delta_t  track
0      0  3456    Z 2016-01-01 00:00:00 00:00:00      0
1      1  3456    Z 2016-01-01 00:01:00 00:00:51      0
2      2  3456    Z 2016-01-01 00:02:00 00:00:51      0
3      3  3456    Z 2016-01-01 00:03:00 00:01:00      0
4      4  3456    Z 2016-01-01 00:04:00 00:01:00      0
5      5  3456    Z 2016-01-01 00:05:00 00:01:00      0
6      6  3456    Z 2016-01-01 00:06:00 00:01:00      0
7      7  3456    Z 2016-01-01 00:07:00 00:01:00      0
8      8  3456    Z 2016-01-01 00:08:00 00:01:00      0
9      9  3456    Z 2016-01-01 00:09:00 00:01:00      0
10    10  3456    Z 2016-01-01 00:10:00 00:01:00      0
11    11  3456    Z 2016-01-01 00:11:00 00:01:00      0
12    12  3456    Z 2016-01-01 00:12:00 00:01:00      0
13    13  3456    Z 2016-01-01 00:13:00 00:01:00      0
14    14  3456    Z 2016-01-01 00:14:00 00:01:00      0
15    15  3456    Z 2016-01-01 00:15:00 00:01:00      0
16    16  3456    Z 2016-01-01 00:16:00 00:01:00      0
17    17  3456    Z 2016-01-01 00:17:00 00:01:00      0
18    18  3456    Z 2016-01-01 00:18:00 00:01:00      0
19    19  3456    Z 2016-01-01 00:19:00 00:01:00      0
20    20  3456    Z 2016-01-01 00:20:00 00:01:00      0
21    21  3456    Z 2016-01-01 00:21:00 00:01:00      0
22    22  3456    Z 2016-01-01 00:22:00 00:01:00      0
23    23  3456    Z 2016-01-01 00:23:00 00:01:00      0
24    24  3456    Z 2016-01-01 00:24:00 00:01:00      0
25    25  3456    Z 2016-01-01 00:25:00 00:01:00      0
26    26  3456    Z 2016-01-01 00:26:00 00:01:00      0
27    27  3456    Z 2016-01-01 00:27:00 00:01:00      0
28    28  3456    Z 2016-01-01 00:28:00 00:01:00      0
29    29  3456    Z 2016-01-01 00:29:00 00:01:00      0
30    30  3456    Z 2016-01-01 00:30:00 00:01:00      0
31    31  3456    Z 2016-01-01 00:31:00 00:01:00      0
32    32  3456    Z 2016-01-01 00:32:00 00:01:00      0
33    33  3456    Z 2016-01-01 00:33:00 00:01:00      0
34    34  3456    Z 2016-01-01 00:34:00 00:01:00      0
35    35  3456    Z 2016-01-01 00:35:00 00:01:00      0
36    36  3456    Z 2016-01-01 00:36:00 00:01:00      0
37    37  3456    Z 2016-01-01 00:37:00 00:01:00      0
38    38  3456    Z 2016-01-01 00:38:00 00:01:00      0
39    39  3456    Z 2016-01-01 00:39:00 00:01:00      0
40    40  3456    Z 2016-01-01 00:40:00 00:01:00      0
41    41  3456    Z 2016-01-01 00:41:00 00:01:00      0
42    42  3456    Z 2016-01-01 00:42:00 00:01:00      0
43    43  3456    Z 2016-01-01 00:43:00 00:01:00      0
44    44  3456    Z 2016-01-01 00:44:00 00:01:00      0
45    45  3456    Z 2016-01-01 00:45:00 00:01:00      0
46    46  3456    Z 2016-01-01 00:46:00 00:01:00      0
47    47  3456    Z 2016-01-01 00:47:00 00:01:00      0
48    48  3456    Z 2016-01-01 00:48:00 00:01:00      0
49    49  3456    Z 2016-01-01 00:49:00 00:01:00      0
50    50  3456    Z 2016-01-01 00:50:00 00:01:00      0
51    51  3456    Z 2016-01-01 00:51:00 00:01:00      0
52    52  3456    Z 2016-01-01 00:52:00 00:01:00      0
53    53  3456    Z 2016-01-01 00:53:00 00:01:00      0
54    54  3456    Z 2016-01-01 00:54:00 00:01:00      0
55    55  3456    Z 2016-01-01 00:55:00 00:01:00      0
56    56  3456    Z 2016-01-01 00:56:00 00:01:00      0
57    57  3456    Z 2016-01-01 00:57:00 00:01:00      0
58    58  3456    Z 2016-01-01 00:58:00 00:01:00      0
59    59  3456    Z 2016-01-01 00:59:00 00:01:00      0

请注意,('Y',3456,0)的组中确实有两个轨道。所以不是一个完整的解决方案。继续我尝试了这个

# In[228]:

# This method works. The difference is that when I calculated the 
# delta_t I did it on the results of the groupby.

# There's a undesireable effect that the track counter doesn't reset with each new
# (site, sat) pair. It appears to keep counting up.

test2 = test.copy()
test2.sort_values(['site', 'sat', 'time'], inplace=True)
test2['delta_t'] = test2.groupby(['site', 'sat'])['time'].diff()       
test2['track'] = (test2['delta_t'] > pd.Timedelta(minutes=20)).cumsum()

my_groups = test2.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
    print(key)
    print(g)

有输出

('X', 3456, 0)
   data   sat site                time  delta_t  track
0     0  3456    X 2016-01-01 00:00:00      NaT      0
1     1  3456    X 2016-01-01 00:00:01 00:00:01      0
2     2  3456    X 2016-01-01 00:00:02 00:00:01      0
3     3  3456    X 2016-01-01 00:00:03 00:00:01      0
4     4  3456    X 2016-01-01 00:00:04 00:00:01      0
('X', 12345, 0)
   data    sat site                time  delta_t  track
0     0  12345    X 2016-01-01 00:00:00      NaT      0
1     1  12345    X 2016-01-01 00:00:01 00:00:01      0
2     2  12345    X 2016-01-01 00:00:02 00:00:01      0
3     3  12345    X 2016-01-01 00:00:03 00:00:01      0
4     4  12345    X 2016-01-01 00:00:04 00:00:01      0
5     5  12345    X 2016-01-01 00:00:05 00:00:01      0
6     6  12345    X 2016-01-01 00:00:06 00:00:01      0
7     7  12345    X 2016-01-01 00:00:07 00:00:01      0
8     8  12345    X 2016-01-01 00:00:08 00:00:01      0
9     9  12345    X 2016-01-01 00:00:09 00:00:01      0
('Y', 3456, 0)
   data   sat site                time  delta_t  track
0     0  3456    Y 2016-01-01 00:01:00      NaT      0
1     1  3456    Y 2016-01-01 00:01:01 00:00:01      0
2     2  3456    Y 2016-01-01 00:01:02 00:00:01      0
3     3  3456    Y 2016-01-01 00:01:03 00:00:01      0
4     4  3456    Y 2016-01-01 00:01:04 00:00:01      0
5     5  3456    Y 2016-01-01 00:01:05 00:00:01      0
6     6  3456    Y 2016-01-01 00:01:06 00:00:01      0
7     7  3456    Y 2016-01-01 00:01:07 00:00:01      0
8     8  3456    Y 2016-01-01 00:01:08 00:00:01      0
9     9  3456    Y 2016-01-01 00:01:09 00:00:01      0
('Y', 3456, 1)
   data   sat site                time  delta_t  track
0     0  3456    Y 2016-01-01 01:00:00 00:58:51      1
1     1  3456    Y 2016-01-01 01:00:01 00:00:01      1
2     2  3456    Y 2016-01-01 01:00:02 00:00:01      1
3     3  3456    Y 2016-01-01 01:00:03 00:00:01      1
4     4  3456    Y 2016-01-01 01:00:04 00:00:01      1
('Y', 3456, 2)
   data   sat site                time  delta_t  track
0     0  3456    Y 2016-01-02 00:00:00 22:59:56      2
1     1  3456    Y 2016-01-02 00:01:00 00:01:00      2
2     2  3456    Y 2016-01-02 00:02:00 00:01:00      2
3     3  3456    Y 2016-01-02 00:03:00 00:01:00      2
4     4  3456    Y 2016-01-02 00:04:00 00:01:00      2
('Z', 3456, 2)
    data   sat site                time  delta_t  track
0      0  3456    Z 2016-01-01 00:00:00      NaT      2
1      1  3456    Z 2016-01-01 00:01:00 00:01:00      2
2      2  3456    Z 2016-01-01 00:02:00 00:01:00      2
3      3  3456    Z 2016-01-01 00:03:00 00:01:00      2
4      4  3456    Z 2016-01-01 00:04:00 00:01:00      2
5      5  3456    Z 2016-01-01 00:05:00 00:01:00      2
6      6  3456    Z 2016-01-01 00:06:00 00:01:00      2
7      7  3456    Z 2016-01-01 00:07:00 00:01:00      2
8      8  3456    Z 2016-01-01 00:08:00 00:01:00      2
9      9  3456    Z 2016-01-01 00:09:00 00:01:00      2
10    10  3456    Z 2016-01-01 00:10:00 00:01:00      2
11    11  3456    Z 2016-01-01 00:11:00 00:01:00      2
12    12  3456    Z 2016-01-01 00:12:00 00:01:00      2
13    13  3456    Z 2016-01-01 00:13:00 00:01:00      2
14    14  3456    Z 2016-01-01 00:14:00 00:01:00      2
15    15  3456    Z 2016-01-01 00:15:00 00:01:00      2
16    16  3456    Z 2016-01-01 00:16:00 00:01:00      2
17    17  3456    Z 2016-01-01 00:17:00 00:01:00      2
18    18  3456    Z 2016-01-01 00:18:00 00:01:00      2
19    19  3456    Z 2016-01-01 00:19:00 00:01:00      2
20    20  3456    Z 2016-01-01 00:20:00 00:01:00      2
21    21  3456    Z 2016-01-01 00:21:00 00:01:00      2
22    22  3456    Z 2016-01-01 00:22:00 00:01:00      2
23    23  3456    Z 2016-01-01 00:23:00 00:01:00      2
24    24  3456    Z 2016-01-01 00:24:00 00:01:00      2
25    25  3456    Z 2016-01-01 00:25:00 00:01:00      2
26    26  3456    Z 2016-01-01 00:26:00 00:01:00      2
27    27  3456    Z 2016-01-01 00:27:00 00:01:00      2
28    28  3456    Z 2016-01-01 00:28:00 00:01:00      2
29    29  3456    Z 2016-01-01 00:29:00 00:01:00      2
30    30  3456    Z 2016-01-01 00:30:00 00:01:00      2
31    31  3456    Z 2016-01-01 00:31:00 00:01:00      2
32    32  3456    Z 2016-01-01 00:32:00 00:01:00      2
33    33  3456    Z 2016-01-01 00:33:00 00:01:00      2
34    34  3456    Z 2016-01-01 00:34:00 00:01:00      2
35    35  3456    Z 2016-01-01 00:35:00 00:01:00      2
36    36  3456    Z 2016-01-01 00:36:00 00:01:00      2
37    37  3456    Z 2016-01-01 00:37:00 00:01:00      2
38    38  3456    Z 2016-01-01 00:38:00 00:01:00      2
39    39  3456    Z 2016-01-01 00:39:00 00:01:00      2
40    40  3456    Z 2016-01-01 00:40:00 00:01:00      2
41    41  3456    Z 2016-01-01 00:41:00 00:01:00      2
42    42  3456    Z 2016-01-01 00:42:00 00:01:00      2
43    43  3456    Z 2016-01-01 00:43:00 00:01:00      2
44    44  3456    Z 2016-01-01 00:44:00 00:01:00      2
45    45  3456    Z 2016-01-01 00:45:00 00:01:00      2
46    46  3456    Z 2016-01-01 00:46:00 00:01:00      2
47    47  3456    Z 2016-01-01 00:47:00 00:01:00      2
48    48  3456    Z 2016-01-01 00:48:00 00:01:00      2
49    49  3456    Z 2016-01-01 00:49:00 00:01:00      2
50    50  3456    Z 2016-01-01 00:50:00 00:01:00      2
51    51  3456    Z 2016-01-01 00:51:00 00:01:00      2
52    52  3456    Z 2016-01-01 00:52:00 00:01:00      2
53    53  3456    Z 2016-01-01 00:53:00 00:01:00      2
54    54  3456    Z 2016-01-01 00:54:00 00:01:00      2
55    55  3456    Z 2016-01-01 00:55:00 00:01:00      2
56    56  3456    Z 2016-01-01 00:56:00 00:01:00      2
57    57  3456    Z 2016-01-01 00:57:00 00:01:00      2
58    58  3456    Z 2016-01-01 00:58:00 00:01:00      2
59    59  3456    Z 2016-01-01 00:59:00 00:01:00      2

赛道分解是正确的,但如果每个新的(站点,坐位)对的赛道计数器重置为0,那将是很好的。

# In[229]:

# This method works. The difference is that when I calculate the
# "track" counter I'm doing the cumulative sum on the results of 
# the groupby. It also resets the track counter with each new 
# site, sat group.

test3 = test.copy()

test3.sort_values(['site', 'sat', 'time'], inplace=True)
test3['delta_t'] = test3.groupby(['site', 'sat'])['time'].diff()    

# calculate an intermediate flag column. If you try to eliminate this
# and put the boolean test directly into the 'track' calculation pandas
# will throw a recursion error.

test3['new_track'] = test3['delta_t'] > pd.Timedelta(minutes=20)

# The to_numeric call is used to convert from a float to an integer.

test3['track'] = pd.to_numeric(test3.groupby(['site', 'sat'])['new_track'].cumsum(), downcast='integer')

my_groups = test3.groupby(by = ['site', 'sat', 'track'])
for key, g in my_groups:
    print(key)
    print(g)

其中输出如下

('X', 3456, 0)
   data   sat site                time  delta_t new_track  track
0     0  3456    X 2016-01-01 00:00:00      NaT     False      0
1     1  3456    X 2016-01-01 00:00:01 00:00:01     False      0
2     2  3456    X 2016-01-01 00:00:02 00:00:01     False      0
3     3  3456    X 2016-01-01 00:00:03 00:00:01     False      0
4     4  3456    X 2016-01-01 00:00:04 00:00:01     False      0
('X', 12345, 0)
   data    sat site                time  delta_t new_track  track
0     0  12345    X 2016-01-01 00:00:00      NaT     False      0
1     1  12345    X 2016-01-01 00:00:01 00:00:01     False      0
2     2  12345    X 2016-01-01 00:00:02 00:00:01     False      0
3     3  12345    X 2016-01-01 00:00:03 00:00:01     False      0
4     4  12345    X 2016-01-01 00:00:04 00:00:01     False      0
5     5  12345    X 2016-01-01 00:00:05 00:00:01     False      0
6     6  12345    X 2016-01-01 00:00:06 00:00:01     False      0
7     7  12345    X 2016-01-01 00:00:07 00:00:01     False      0
8     8  12345    X 2016-01-01 00:00:08 00:00:01     False      0
9     9  12345    X 2016-01-01 00:00:09 00:00:01     False      0
('Y', 3456, 0)
   data   sat site                time  delta_t new_track  track
0     0  3456    Y 2016-01-01 00:01:00      NaT     False      0
1     1  3456    Y 2016-01-01 00:01:01 00:00:01     False      0
2     2  3456    Y 2016-01-01 00:01:02 00:00:01     False      0
3     3  3456    Y 2016-01-01 00:01:03 00:00:01     False      0
4     4  3456    Y 2016-01-01 00:01:04 00:00:01     False      0
5     5  3456    Y 2016-01-01 00:01:05 00:00:01     False      0
6     6  3456    Y 2016-01-01 00:01:06 00:00:01     False      0
7     7  3456    Y 2016-01-01 00:01:07 00:00:01     False      0
8     8  3456    Y 2016-01-01 00:01:08 00:00:01     False      0
9     9  3456    Y 2016-01-01 00:01:09 00:00:01     False      0
('Y', 3456, 1)
   data   sat site                time  delta_t new_track  track
0     0  3456    Y 2016-01-01 01:00:00 00:58:51      True      1
1     1  3456    Y 2016-01-01 01:00:01 00:00:01     False      1
2     2  3456    Y 2016-01-01 01:00:02 00:00:01     False      1
3     3  3456    Y 2016-01-01 01:00:03 00:00:01     False      1
4     4  3456    Y 2016-01-01 01:00:04 00:00:01     False      1
('Y', 3456, 2)
   data   sat site                time  delta_t new_track  track
0     0  3456    Y 2016-01-02 00:00:00 22:59:56      True      2
1     1  3456    Y 2016-01-02 00:01:00 00:01:00     False      2
2     2  3456    Y 2016-01-02 00:02:00 00:01:00     False      2
3     3  3456    Y 2016-01-02 00:03:00 00:01:00     False      2
4     4  3456    Y 2016-01-02 00:04:00 00:01:00     False      2
('Z', 3456, 0)
    data   sat site                time  delta_t new_track  track
0      0  3456    Z 2016-01-01 00:00:00      NaT     False      0
1      1  3456    Z 2016-01-01 00:01:00 00:01:00     False      0
2      2  3456    Z 2016-01-01 00:02:00 00:01:00     False      0
3      3  3456    Z 2016-01-01 00:03:00 00:01:00     False      0
4      4  3456    Z 2016-01-01 00:04:00 00:01:00     False      0
5      5  3456    Z 2016-01-01 00:05:00 00:01:00     False      0
6      6  3456    Z 2016-01-01 00:06:00 00:01:00     False      0
7      7  3456    Z 2016-01-01 00:07:00 00:01:00     False      0
8      8  3456    Z 2016-01-01 00:08:00 00:01:00     False      0
9      9  3456    Z 2016-01-01 00:09:00 00:01:00     False      0
10    10  3456    Z 2016-01-01 00:10:00 00:01:00     False      0
11    11  3456    Z 2016-01-01 00:11:00 00:01:00     False      0
12    12  3456    Z 2016-01-01 00:12:00 00:01:00     False      0
13    13  3456    Z 2016-01-01 00:13:00 00:01:00     False      0
14    14  3456    Z 2016-01-01 00:14:00 00:01:00     False      0
15    15  3456    Z 2016-01-01 00:15:00 00:01:00     False      0
16    16  3456    Z 2016-01-01 00:16:00 00:01:00     False      0
17    17  3456    Z 2016-01-01 00:17:00 00:01:00     False      0
18    18  3456    Z 2016-01-01 00:18:00 00:01:00     False      0
19    19  3456    Z 2016-01-01 00:19:00 00:01:00     False      0
20    20  3456    Z 2016-01-01 00:20:00 00:01:00     False      0
21    21  3456    Z 2016-01-01 00:21:00 00:01:00     False      0
22    22  3456    Z 2016-01-01 00:22:00 00:01:00     False      0
23    23  3456    Z 2016-01-01 00:23:00 00:01:00     False      0
24    24  3456    Z 2016-01-01 00:24:00 00:01:00     False      0
25    25  3456    Z 2016-01-01 00:25:00 00:01:00     False      0
26    26  3456    Z 2016-01-01 00:26:00 00:01:00     False      0
27    27  3456    Z 2016-01-01 00:27:00 00:01:00     False      0
28    28  3456    Z 2016-01-01 00:28:00 00:01:00     False      0
29    29  3456    Z 2016-01-01 00:29:00 00:01:00     False      0
30    30  3456    Z 2016-01-01 00:30:00 00:01:00     False      0
31    31  3456    Z 2016-01-01 00:31:00 00:01:00     False      0
32    32  3456    Z 2016-01-01 00:32:00 00:01:00     False      0
33    33  3456    Z 2016-01-01 00:33:00 00:01:00     False      0
34    34  3456    Z 2016-01-01 00:34:00 00:01:00     False      0
35    35  3456    Z 2016-01-01 00:35:00 00:01:00     False      0
36    36  3456    Z 2016-01-01 00:36:00 00:01:00     False      0
37    37  3456    Z 2016-01-01 00:37:00 00:01:00     False      0
38    38  3456    Z 2016-01-01 00:38:00 00:01:00     False      0
39    39  3456    Z 2016-01-01 00:39:00 00:01:00     False      0
40    40  3456    Z 2016-01-01 00:40:00 00:01:00     False      0
41    41  3456    Z 2016-01-01 00:41:00 00:01:00     False      0
42    42  3456    Z 2016-01-01 00:42:00 00:01:00     False      0
43    43  3456    Z 2016-01-01 00:43:00 00:01:00     False      0
44    44  3456    Z 2016-01-01 00:44:00 00:01:00     False      0
45    45  3456    Z 2016-01-01 00:45:00 00:01:00     False      0
46    46  3456    Z 2016-01-01 00:46:00 00:01:00     False      0
47    47  3456    Z 2016-01-01 00:47:00 00:01:00     False      0
48    48  3456    Z 2016-01-01 00:48:00 00:01:00     False      0
49    49  3456    Z 2016-01-01 00:49:00 00:01:00     False      0
50    50  3456    Z 2016-01-01 00:50:00 00:01:00     False      0
51    51  3456    Z 2016-01-01 00:51:00 00:01:00     False      0
52    52  3456    Z 2016-01-01 00:52:00 00:01:00     False      0
53    53  3456    Z 2016-01-01 00:53:00 00:01:00     False      0
54    54  3456    Z 2016-01-01 00:54:00 00:01:00     False      0
55    55  3456    Z 2016-01-01 00:55:00 00:01:00     False      0
56    56  3456    Z 2016-01-01 00:56:00 00:01:00     False      0
57    57  3456    Z 2016-01-01 00:57:00 00:01:00     False      0
58    58  3456    Z 2016-01-01 00:58:00 00:01:00     False      0
59    59  3456    Z 2016-01-01 00:59:00 00:01:00     False      0

所以我觉得我现在有一个很好的解决方案。如果您尝试将new_track测试合并到track计算中,那么大熊猫会抛出递归错误,这很奇怪。