我有一个名为mydf的数据框
mydf
我在下面的操作中执行,然后将其转换为系列。
mydf.groupby([mydf.type,mydf.name]).size()
现在我有一个系列有两个级别的类型,即演员和女演员。
type name
actor 'Big' Ben Moroz 1
'Ducky' Louie 3
'Fast' Eddie Mahler 1
'King Kong' Kashey 1
'Muddy' Berry 1
actress Zedra Conde 3
Zena Marshall 1
Zinaida Morskaya 1
Zoe Holland 1
Zoia Karabanova 2
现在我希望我的结果在级别演员中按降序排序,如果演员"值" (在第三个未命名的列中给出) )相同然后排序必须由" name" 完成,然后在其他级别名为actress的排序完成时必须遵循相同的模式
type name
actor 'Ducky' Louie 3
'Big' Ben Moroz 1
'Fast' Eddie Mahler 1
'King Kong' Kashey 1
'Muddy' Berry 1
actress Zedra Conde 3
Zoia Karabanova 2
Zena Marshall 1
Zinaida Morskaya 1
Zoe Holland 1
注意: - 请避免循环播放。
答案 0 :(得分:0)
不幸的是,我提出的所有内容都需要进行双重分组/排序。假设我们有一个DataFrame
import pandas as pd
import numpy as np
import random
d = pd.DataFrame({'type': ['actor']*5+['actress']*5,
'name' : [random.choice(['a', 'b', 'c']) for i in range(10)]})
d
name type
0 c actor
1 c actor
2 a actor
3 b actor
4 a actor
5 c actress
6 c actress
7 c actress
8 a actress
9 a actress
d.groupby([d.type,d.name]).size()
type name
actor a 2
b 1
c 2
actress a 2
c 3
dtype: int64
方法1:
d.groupby([d.type,d.name]).size().groupby(level=[0]).apply(lambda x: x.sort_values(ascending=False))
type type name
actor actor c 2
a 2
b 1
actress actress c 3
a 2
dtype: int64
方法2:
d1 = d.groupby([d.type,d.name]).size()
d2 = d1.reset_index()
d2.columns = ['type', 'actress', 'sz']
d2.sort_values(by = ['type', 'sz', 'actress'], ascending = [True, False, True])
type actress sz
0 actor a 2
2 actor c 2
1 actor b 1
4 actress c 3
3 actress a 2