熊猫找到字符串出现的平均值

时间:2017-08-15 21:22:14

标签: python pandas pandas-groupby

我正在使用数据框来尝试查找平均值,并且在尝试将值计数转换为我的分组df的平均值时遇到困难。代码如下:

df2 = df.groupby(['school', 'Race/Ethnicity']).size()

school          Race/Ethnicity                        
school1         African American/Black                     15
                American Indian/Alaska Native               1
                Bi-racial/Multi-racial                      4
                Latino/a                                   53
                Other - Write In (Required)                 1
                White                                       2
school2         African American/Black                      1
                American Indian/Alaska Native               5
                Asian                                       1
                Bi-Racial/Multi-Racial                      1
                Latino/a                                   26

我有很多不同的学校,而不是规模,我想找到每所学校每个种族的平均值。如何迭代组以查找每个组的总和,然后将每一行除以其组的总和?

1 个答案:

答案 0 :(得分:1)

使用normalize

中的value_counts参数
df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True)

school   Race/Ethnicity               
school1  Latino/a                         0.697368
         African American/Black           0.197368
         Bi-racial/Multi-racial           0.052632
         White                            0.026316
         American Indian/Alaska Native    0.013158
         Other - Write In (Required)      0.013158
school2  Latino/a                         0.764706
         American Indian/Alaska Native    0.147059
         African American/Black           0.029412
         Asian                            0.029412
         Bi-Racial/Multi-Racial           0.029412
Name: Race/Ethnicity, dtype: float64

您也可以跳过排序

df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True, sort=False)

school   Race/Ethnicity               
school1  African American/Black           0.197368
         American Indian/Alaska Native    0.013158
         Bi-racial/Multi-racial           0.052632
         Latino/a                         0.697368
         Other - Write In (Required)      0.013158
         White                            0.026316
school2  African American/Black           0.029412
         American Indian/Alaska Native    0.147059
         Asian                            0.029412
         Bi-Racial/Multi-Racial           0.029412
         Latino/a                         0.764706
Name: Race/Ethnicity, dtype: float64

设置

df = pd.DataFrame(
    [['school1', 'African American/Black']] * 15 +
    [['school1', 'American Indian/Alaska Native']] * 1 + 
    [['school1', 'Bi-racial/Multi-racial']] * 4 +
    [['school1', 'Latino/a']] * 53 +
    [['school1', 'Other - Write In (Required)']] * 1 +
    [['school1', 'White']] * 2 +
    [['school2', 'African American/Black']] * 1 +
    [['school2', 'American Indian/Alaska Native']] * 5 +
    [['school2', 'Asian']] * 1 +
    [['school2', 'Bi-Racial/Multi-Racial']] * 1 +
    [['school2', 'Latino/a']] * 26,
    columns=['school', 'Race/Ethnicity']
)