我想知道我是否可以使用特定列的不同值来使用describe函数分组?
例如,我们说我有以下数据框:
import pandas as pd
data = {'gender': ['male', 'female', 'female', 'male',' female'],
'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['gender', 'name', 'age', 'preTestScore', 'postTestScore'])
现在,如果我使用describe函数,我将获得整个数据框的描述性统计信息
df.describe()
age preTestScore postTestScore
count 5.000000 5.000000 5.000000
mean 45.400000 12.800000 61.600000
std 18.460769 13.663821 24.905823
min 24.000000 2.000000 25.000000
25% 36.000000 3.000000 57.000000
50% 42.000000 4.000000 62.000000
75% 52.000000 24.000000 70.000000
max 73.000000 31.000000 94.000000
如果我想按性别分组并按性别获取描述性统计数据(可能是两个单独的输出),我该怎么办?
答案 0 :(得分:1)
如果您需要两个单独的输出,则可以执行以下操作:
df[df.gender == 'male'].describe()
df[df.gender == 'female'].describe()
答案 1 :(得分:1)
您可以使用groupby.describe:
df.groupby('gender').describe()
Out:
age postTestScore preTestScore
gender
female count 3.000000 3.000000 3.000000
mean 53.666667 73.666667 19.333333
std 18.556221 18.770544 14.571662
min 36.000000 57.000000 3.000000
25% 44.000000 63.500000 13.500000
50% 52.000000 70.000000 24.000000
75% 62.500000 82.000000 27.500000
max 73.000000 94.000000 31.000000
male count 2.000000 2.000000 2.000000
mean 33.000000 43.500000 3.000000
std 12.727922 26.162951 1.414214
min 24.000000 25.000000 2.000000
25% 28.500000 34.250000 2.500000
50% 33.000000 43.500000 3.000000
75% 37.500000 52.750000 3.500000
max 42.000000 62.000000 4.000000