Dataframe groupby sort(分类变量)

时间:2017-03-08 04:53:09

标签: python-3.x pandas matplotlib plot

In [167]:
    df

Out[167]:
    Gender  University
0   Male    A
1   Female  B
2   Male    C
3   Male    D
4   Male    E
5   Female  A
6   Female  B
7   Female  C
8   Female  D
9   Female  E

In [168]:
df.groupby(['University','Gender'])['Gender'].size().unstack('Gender').fillna(0)

Out[168]:

enter image description here

现在,我想按照女性和男性从最高到最低排序,这样当我禁止绘图时,它将按降序排列。我尝试了很多方法但无济于事。

在我最后的尝试中,我尝试过:

df.groupby(['University','Gender'])['Gender'].size().unstack('Gender').fillna(0).sort_values(ascending=False)

TypeError: sort_values() missing 1 required positional argument: 'by'

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

您可以按一列或另一列进行排序:

print (df)
   Gender University
0    Male          A
1  Female          B
3    Male          D
4    Male          E
5  Female          A
2    Male          C
3    Male          D
4    Male          E
5  Female          A
6  Female          B
7  Female          C
8  Female          D
4    Male          E
5  Female          A
6  Female          B
3    Male          D
4    Male          E
5  Female          A
7  Female          C
8  Female          D
9  Female          E
df1 = df.groupby(['University','Gender'])['Gender']
        .size()
        .unstack('Gender', fill_value=0)
        .sort_values(by='Female', ascending=False)

print (df1)
Gender      Female  Male
University              
A                4     1
B                3     0
C                2     1
D                2     3
E                1     4

df1.plot.bar()

graph1

df2 = df.groupby(['University','Gender'])['Gender']
        .size()
        .unstack('Gender', fill_value=0)
        .sort_values(by='Male', ascending=False)
print (df2)
Gender      Female  Male
University              
E                1     4
D                2     3
A                4     1
C                2     1
B                3     0

df2.plot.bar()

graph2

如果按两列排序排序第二列排序只重复(DC列):

df3 = df.groupby(['University','Gender'])['Gender']
        .size()
        .unstack('Gender', fill_value=0)
        .sort_values(by=['Female', 'Male'], ascending=False)
print (df3)

df3.plot.bar()

graph