Pandas系列的等级智能排序

时间:2016-08-20 12:24:00

标签: python python-3.x sorting pandas series

我有一个名为mydf的数据框

mydf

enter image description here

我在下面的操作中执行,然后将其转换为系列。

mydf.groupby([mydf.type,mydf.name]).size()

现在我有一个系列有两个级别的类型,即演员和女演员。

    type      name               
    actor    'Big' Ben Moroz        1
             'Ducky' Louie          3
             'Fast' Eddie Mahler    1
             'King Kong' Kashey     1
             'Muddy' Berry          1

    actress   Zedra Conde           3
              Zena Marshall         1
              Zinaida Morskaya      1
              Zoe Holland           1
              Zoia Karabanova       2

现在我希望我的结果在级别演员中按降序排序,如果演员"值" (在第三个未命名的列中给出) )相同然后排序必须由" name" 完成,然后在其他级别名为actress的排序完成时必须遵循相同的模式

type      name               
actor    'Ducky' Louie          3
         'Big' Ben Moroz        1
         'Fast' Eddie Mahler    1
         'King Kong' Kashey     1
         'Muddy' Berry          1

actress   Zedra Conde           3
          Zoia Karabanova       2
          Zena Marshall         1
          Zinaida Morskaya      1
          Zoe Holland           1

注意: - 请避免循环播放。

1 个答案:

答案 0 :(得分:0)

不幸的是,我提出的所有内容都需要进行双重分组/排序。假设我们有一个DataFrame

import pandas as pd
import numpy as np
import random

d = pd.DataFrame({'type': ['actor']*5+['actress']*5,  
                  'name' : [random.choice(['a', 'b', 'c']) for i in range(10)]})
d


    name    type
0   c   actor
1   c   actor
2   a   actor
3   b   actor
4   a   actor
5   c   actress
6   c   actress
7   c   actress
8   a   actress
9   a   actress


d.groupby([d.type,d.name]).size()

type     name
actor    a       2
         b       1
         c       2
actress  a       2
         c       3
dtype: int64

方法1:

d.groupby([d.type,d.name]).size().groupby(level=[0]).apply(lambda x: x.sort_values(ascending=False))

type     type     name
actor    actor    c       2
                  a       2
                  b       1
actress  actress  c       3
                  a       2
dtype: int64

方法2:

d1 = d.groupby([d.type,d.name]).size()
d2 = d1.reset_index()
d2.columns = ['type', 'actress', 'sz']
d2.sort_values(by = ['type',  'sz', 'actress'], ascending = [True, False, True])

    type    actress sz
0   actor   a   2
2   actor   c   2
1   actor   b   1
4   actress c   3
3   actress a   2