Question

我在熊猫中有以下数据框

 code     date         time        tank    
 123      01-01-2018   08:00:00    1
 123      01-01-2018   11:00:00    1
 123      01-01-2018   12:00:00    1
 123      01-01-2018   13:00:00    1
 123      01-01-2018   07:00:00    1
 123      01-01-2018   09:00:00    1
 124      01-01-2018   08:00:00    2
 124      01-01-2018   11:00:00    2
 124      01-01-2018   12:00:00    2
 124      01-01-2018   13:00:00    2
 124      01-01-2018   07:00:00    2
 124      01-01-2018   09:00:00    2

我正在按“时间”分组并对其进行排序

df= df.groupby(['code', 'date', 'tank']).apply(lambda x: x.sort_values(['time'], ascending=True)).reset_index()

当我执行reset_index（）时，出现以下错误

ValueError: cannot insert tank, already exists

Answer 1

如何按每个分组键列进行排序，并以“时间”递减？

df.sort_values(['code', 'date', 'tank', 'time'], ascending=[True]*3 + [False])

    code        date      time  tank
3    123  01-01-2018  13:00:00     1
2    123  01-01-2018  12:00:00     1
1    123  01-01-2018  11:00:00     1
5    123  01-01-2018  09:00:00     1
0    123  01-01-2018  08:00:00     1
4    123  01-01-2018  07:00:00     1
9    124  01-01-2018  13:00:00     2
8    124  01-01-2018  12:00:00     2
7    124  01-01-2018  11:00:00     2
11   124  01-01-2018  09:00:00     2
6    124  01-01-2018  08:00:00     2
10   124  01-01-2018  07:00:00     2

这将达到相同的效果，但没有groupby。

如果需要groupby，则需要两次 reset_index通话（以删除最后一个级别）：

(df.groupby(['code', 'date', 'tank'])
   .time.apply(lambda x: x.sort_values(ascending=False))
   .reset_index(level=-1, drop=True)
   .reset_index())

    code        date  tank      time
0    123  01-01-2018     1  13:00:00
1    123  01-01-2018     1  12:00:00
2    123  01-01-2018     1  11:00:00
3    123  01-01-2018     1  09:00:00
4    123  01-01-2018     1  08:00:00
5    123  01-01-2018     1  07:00:00
6    124  01-01-2018     2  13:00:00
7    124  01-01-2018     2  12:00:00
8    124  01-01-2018     2  11:00:00
9    124  01-01-2018     2  09:00:00
10   124  01-01-2018     2  08:00:00
11   124  01-01-2018     2  07:00:00

在每个组中按降序排序

1 个答案: