Question

我正在为员工构建一个基本的rota / schedule，并且有一个来自MySQL游标的DataFrame，它提供了ID，日期和类的列表

        id             the_date  class
0   195593  2017-09-12 14:00:00      3
1   193972  2017-09-13 09:15:00      2
2   195594  2017-09-13 14:00:00      3
3   195595  2017-09-15 14:00:00      3
4   193947  2017-09-16 17:30:00      3
5   195627  2017-09-17 08:00:00      2
6   193948  2017-09-19 11:30:00      2
7   195628  2017-09-21 08:00:00      2
8   193949  2017-09-21 11:30:00      2
9   195629  2017-09-24 08:00:00      2
10  193950  2017-09-24 10:00:00      2
11  193951  2017-09-27 11:30:00      2
12  195644  2017-09-28 06:00:00      1
13  194400  2017-09-28 08:00:00      1
14  195630  2017-09-28 08:00:00      2
15  193952  2017-09-29 11:30:00      2
16  195631  2017-10-01 08:00:00      2
17  194401  2017-10-06 08:00:00      1
18  195645  2017-10-06 10:00:00      1
19  195632  2017-10-07 13:30:00      3

如果类== 1，我需要将该实例重复5次。

first_class = df[df['class'] == 1]
non_first_class = df[df['class'] != 1]
first_class_replicated = pd.concat([tests_df]*5,ignore_index=True).sort_values(['the_date'])

    id             the_date  class
0   195644  2017-09-28 06:00:00      1
16  195644  2017-09-28 06:00:00      1
4   195644  2017-09-28 06:00:00      1
12  195644  2017-09-28 06:00:00      1
8   195644  2017-09-28 06:00:00      1
17  194400  2017-09-28 08:00:00      1
13  194400  2017-09-28 08:00:00      1
9   194400  2017-09-28 08:00:00      1
5   194400  2017-09-28 08:00:00      1
1   194400  2017-09-28 08:00:00      1
6   194401  2017-10-06 08:00:00      1
18  194401  2017-10-06 08:00:00      1
10  194401  2017-10-06 08:00:00      1
14  194401  2017-10-06 08:00:00      1
2   194401  2017-10-06 08:00:00      1
11  195645  2017-10-06 10:00:00      1
3   195645  2017-10-06 10:00:00      1
15  195645  2017-10-06 10:00:00      1
7   195645  2017-10-06 10:00:00      1
19  195645  2017-10-06 10:00:00      1

然后我合并non_first_class和first_class_replicated。在此之前，我需要first_class_replicated中的日期增加一天，按ID分组。以下是我需要它的样子。是否有一个优雅的熊猫解决方案，或者我应该考虑循环groupby系列来修改日期？

所需：

id      
0   195644  2017-09-28 6:00:00
16  195644  2017-09-29 6:00:00
4   195644  2017-09-30 6:00:00
12  195644  2017-10-01 6:00:00
8   195644  2017-10-02 6:00:00
17  194400  2017-09-28 8:00:00
13  194400  2017-09-29 8:00:00
9   194400  2017-09-30 8:00:00
5   194400  2017-10-01 8:00:00
1   194400  2017-10-02 8:00:00
6   194401  2017-10-06 8:00:00
18  194401  2017-10-07 8:00:00
10  194401  2017-10-08 8:00:00
14  194401  2017-10-09 8:00:00
2   194401  2017-10-10 8:00:00
11  195645  2017-10-06 10:00:00
3   195645  2017-10-07 10:00:00
15  195645  2017-10-08 10:00:00
7   195645  2017-10-09 10:00:00
19  195645  2017-10-10 10:00:00

Answer 1

您可以将cumcount用于计数类别，然后转换to_timedelta并添加到列：

#another solution for repeat
first_class_replicated = first_class.loc[np.repeat(first_class.index, 5)]
                                    .sort_values(['the_date'])

df1 = first_class_replicated.groupby('id').cumcount()
first_class_replicated['the_date'] += pd.to_timedelta(df1, unit='D')
print (first_class_replicated)
        id            the_date  class
0   195644 2017-09-28 06:00:00      1
16  195644 2017-09-29 06:00:00      1
4   195644 2017-09-30 06:00:00      1
12  195644 2017-10-01 06:00:00      1
8   195644 2017-10-02 06:00:00      1
17  194400 2017-09-28 08:00:00      1
13  194400 2017-09-29 08:00:00      1
9   194400 2017-09-30 08:00:00      1
5   194400 2017-10-01 08:00:00      1
1   194400 2017-10-02 08:00:00      1
6   194401 2017-10-06 08:00:00      1
18  194401 2017-10-07 08:00:00      1
10  194401 2017-10-08 08:00:00      1
14  194401 2017-10-09 08:00:00      1
2   194401 2017-10-10 08:00:00      1
11  195645 2017-10-06 10:00:00      1
3   195645 2017-10-07 10:00:00      1
15  195645 2017-10-08 10:00:00      1
7   195645 2017-10-09 10:00:00      1
19  195645 2017-10-10 10:00:00      1

在pandas groupby中增加日期

1 个答案: