用Groupby Pandas描述函数(Python 3.5.1)

时间:2016-10-31 16:30:14

标签: python pandas

我想知道我是否可以使用特定列的不同值来使用describe函数分组?

例如,我们说我有以下数据框:

import pandas as pd
data = {'gender': ['male', 'female', 'female', 'male',' female'],
        'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
        'age': [42, 52, 36, 24, 73],
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['gender', 'name', 'age', 'preTestScore', 'postTestScore'])

现在,如果我使用describe函数,我将获得整个数据框的描述性统计信息

df.describe()
    age preTestScore    postTestScore
count   5.000000    5.000000    5.000000
mean    45.400000   12.800000   61.600000
std     18.460769   13.663821   24.905823
min     24.000000   2.000000    25.000000
25%     36.000000   3.000000    57.000000
50%     42.000000   4.000000    62.000000
75%     52.000000   24.000000   70.000000
max     73.000000   31.000000   94.000000

如果我想按性别分组并按性别获取描述性统计数据(可能是两个单独的输出),我该怎么办?

2 个答案:

答案 0 :(得分:1)

如果您需要两个单独的输出,则可以执行以下操作:

df[df.gender == 'male'].describe()
df[df.gender == 'female'].describe()

答案 1 :(得分:1)

您可以使用groupby.describe:

df.groupby('gender').describe()
Out: 
                    age  postTestScore  preTestScore
gender                                              
female count   3.000000       3.000000      3.000000
       mean   53.666667      73.666667     19.333333
       std    18.556221      18.770544     14.571662
       min    36.000000      57.000000      3.000000
       25%    44.000000      63.500000     13.500000
       50%    52.000000      70.000000     24.000000
       75%    62.500000      82.000000     27.500000
       max    73.000000      94.000000     31.000000
male   count   2.000000       2.000000      2.000000
       mean   33.000000      43.500000      3.000000
       std    12.727922      26.162951      1.414214
       min    24.000000      25.000000      2.000000
       25%    28.500000      34.250000      2.500000
       50%    33.000000      43.500000      3.000000
       75%    37.500000      52.750000      3.500000
       max    42.000000      62.000000      4.000000