我正在使用数据框来尝试查找平均值,并且在尝试将值计数转换为我的分组df的平均值时遇到困难。代码如下:
df2 = df.groupby(['school', 'Race/Ethnicity']).size()
school Race/Ethnicity
school1 African American/Black 15
American Indian/Alaska Native 1
Bi-racial/Multi-racial 4
Latino/a 53
Other - Write In (Required) 1
White 2
school2 African American/Black 1
American Indian/Alaska Native 5
Asian 1
Bi-Racial/Multi-Racial 1
Latino/a 26
我有很多不同的学校,而不是规模,我想找到每所学校每个种族的平均值。如何迭代组以查找每个组的总和,然后将每一行除以其组的总和?
答案 0 :(得分:1)
使用normalize
value_counts
参数
df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True)
school Race/Ethnicity
school1 Latino/a 0.697368
African American/Black 0.197368
Bi-racial/Multi-racial 0.052632
White 0.026316
American Indian/Alaska Native 0.013158
Other - Write In (Required) 0.013158
school2 Latino/a 0.764706
American Indian/Alaska Native 0.147059
African American/Black 0.029412
Asian 0.029412
Bi-Racial/Multi-Racial 0.029412
Name: Race/Ethnicity, dtype: float64
您也可以跳过排序
df.groupby('school')['Race/Ethnicity'].value_counts(normalize=True, sort=False)
school Race/Ethnicity
school1 African American/Black 0.197368
American Indian/Alaska Native 0.013158
Bi-racial/Multi-racial 0.052632
Latino/a 0.697368
Other - Write In (Required) 0.013158
White 0.026316
school2 African American/Black 0.029412
American Indian/Alaska Native 0.147059
Asian 0.029412
Bi-Racial/Multi-Racial 0.029412
Latino/a 0.764706
Name: Race/Ethnicity, dtype: float64
设置
df = pd.DataFrame(
[['school1', 'African American/Black']] * 15 +
[['school1', 'American Indian/Alaska Native']] * 1 +
[['school1', 'Bi-racial/Multi-racial']] * 4 +
[['school1', 'Latino/a']] * 53 +
[['school1', 'Other - Write In (Required)']] * 1 +
[['school1', 'White']] * 2 +
[['school2', 'African American/Black']] * 1 +
[['school2', 'American Indian/Alaska Native']] * 5 +
[['school2', 'Asian']] * 1 +
[['school2', 'Bi-Racial/Multi-Racial']] * 1 +
[['school2', 'Latino/a']] * 26,
columns=['school', 'Race/Ethnicity']
)