Question

我在python中有一个数据框。数据框的列为Id，loc_time，loc_number，status。

数据如下：

Id  loc_time    loc_number  status
1   01:25.5     1105        testing on
2   02:25.9     1105        testing off
3   03:28.5     1105        testing off
4   04:25.5     1105        testing off
5   05:25.9     1105        testing on
6   06:25.5     1105        testing on
7   07:25.9     1105        testing off
8   08:25.6     1105        testing off
9   09:25.9     1106        testing on
10  10:25.6     1105        testing on
11  11:26.0     1105        testing off
12  12:25.6     1105        testing off
13  13:26.0     1105        testing on
14  14:25.6     1106        testing on
15  15:26.0     1105        testing off
16  16:25.6     1105        testing off
17  17:26.0     1105        testing on
18  18:25.7     1105        testing on
19  19:26.0     1105        testing off
20  20:25.7     1105        testing off
21  21:26.1     1105        testing on
22  22:25.7     1106        testing on
23  22:33.7     1107        testing on
24  23:26.1     1105        testing off
25  24:25.7     1105        testing off
26  25:26.1     1105        testing on
27  27:25.7     1105        testing on
28  22:35.7     1106        testing off

现在，我想创建一个包含Id，loc_time，loc_number，status和count列的新数据框。

Id  loc_time    loc_number  status          count
1   03:28.5     1105        testing on      03
2   06:25.5     1105        testing         03
3   10:25.6     1105        testing         03
4   13:26.0     1105        testing         03
5   17:26.0     1105        testing         03
6   20:25.7     1105        testing         03
7   24:25.7     1105        testing         03
8   27:25.7     1105        testing off     02
9   22:25.7     1106        testing on      03
10  22:35.7     1106        testing off     01
11  22:33.7     1107        testing on      01

我希望将前十个时间戳记录分组为一条记录，并指定测试状态，并且还计算记录数。

我想对接下来的十条记录做同样的事情，并将状态指定为测试。

对于最后一组数据，我希望状态为test off

我该怎么做？

当1到10个时间戳组合在一起用于相同的loc_number时，则进行状态测试。

如果同一loc_number的1-10个时间戳之后有超过10个时间戳，则状态为test   等等

如果同一loc_number的前一组10个时间戳之后的时间戳少于10个，则状态为test off

组合在一起的最后一个时间戳应该被测试。

Answer 1

现在应该工作。如果您不想索引该列上的数据框，则可以随时删除var service = new ExchangeService(exchangeVersion) { KeepAlive = true, Url = new Uri("some autodiscovery url"), Credentials = new OAuthCredentials(authenticationResult.AccessToken)) };（最后一行）。

首先，我需要按df2 = df2.set_index('ID')和loc_number按顺序对数据框进行排序。

接下来，我需要为这些不等大小的组创建连续的数字块（例如，1,1,1,2,2,1,1,1,2,2,2,3,3假设两个{{1 }}）。为此，我在loc_time上进行了分组并执行了一个使用地板划分的转换，使用列表推导将每个项目的索引除以分组大小（例如3）。

loc_numbers

接下来，我将loc_number和新的transform(lambda group: [i // group_size for i in range(len(group))]))分组，以完成剩余的聚合。

我使用列表理解来获取每个组的第一个和最后一个项目。然后，我根据需要使用loc_number将状态设置为loc_counter或.loc。

testing_off

将多个记录分组为一个记录并在python数据框中分配值

1 个答案: