我在熊猫中有以下数据框
code date time tank
123 01-01-2018 08:00:00 1
123 01-01-2018 11:00:00 1
123 01-01-2018 12:00:00 1
123 01-01-2018 13:00:00 1
123 01-01-2018 07:00:00 1
123 01-01-2018 09:00:00 1
124 01-01-2018 08:00:00 2
124 01-01-2018 11:00:00 2
124 01-01-2018 12:00:00 2
124 01-01-2018 13:00:00 2
124 01-01-2018 07:00:00 2
124 01-01-2018 09:00:00 2
我正在按“时间”分组并对其进行排序
df= df.groupby(['code', 'date', 'tank']).apply(lambda x: x.sort_values(['time'], ascending=True)).reset_index()
当我执行reset_index()时,出现以下错误
ValueError: cannot insert tank, already exists
答案 0 :(得分:2)
如何按每个分组键列进行排序,并以“时间”递减?
df.sort_values(['code', 'date', 'tank', 'time'], ascending=[True]*3 + [False])
code date time tank
3 123 01-01-2018 13:00:00 1
2 123 01-01-2018 12:00:00 1
1 123 01-01-2018 11:00:00 1
5 123 01-01-2018 09:00:00 1
0 123 01-01-2018 08:00:00 1
4 123 01-01-2018 07:00:00 1
9 124 01-01-2018 13:00:00 2
8 124 01-01-2018 12:00:00 2
7 124 01-01-2018 11:00:00 2
11 124 01-01-2018 09:00:00 2
6 124 01-01-2018 08:00:00 2
10 124 01-01-2018 07:00:00 2
这将达到相同的效果,但没有groupby
。
如果需要groupby
,则需要两次 reset_index
通话(以删除最后一个级别):
(df.groupby(['code', 'date', 'tank'])
.time.apply(lambda x: x.sort_values(ascending=False))
.reset_index(level=-1, drop=True)
.reset_index())
code date tank time
0 123 01-01-2018 1 13:00:00
1 123 01-01-2018 1 12:00:00
2 123 01-01-2018 1 11:00:00
3 123 01-01-2018 1 09:00:00
4 123 01-01-2018 1 08:00:00
5 123 01-01-2018 1 07:00:00
6 124 01-01-2018 2 13:00:00
7 124 01-01-2018 2 12:00:00
8 124 01-01-2018 2 11:00:00
9 124 01-01-2018 2 09:00:00
10 124 01-01-2018 2 08:00:00
11 124 01-01-2018 2 07:00:00