将值分组到自定义容器中

时间:2019-10-09 19:33:46

标签: python pandas numpy bins

我有一个带有“教育”属性的数据框。值是离散的1-16。出于交叉制表的目的,我想将此“教育”变量进行分类,但要使用自定义分类(1:8、9:11、12、13:15、16)。

我一直在鬼混pd.cut(),但收到无效的语法错误

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])

1 个答案:

答案 0 :(得分:1)

尝试使垃圾箱落在阈值之间:

bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School', 
        'High School Grad', 'Some College', 'College Grad']

adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
                                             bins=bins,
                                             labels=labels)

测试:

adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})

输出:

    education         education_bins
0           1  Middle School or less
1           2  Middle School or less
2           3  Middle School or less
3           4  Middle School or less
4           5  Middle School or less
5           6  Middle School or less
6           7  Middle School or less
7           8  Middle School or less
8           9       Some High School
9          10       Some High School
10         11       Some High School
11         12       High School Grad
12         13           Some College
13         14           Some College
14         15           Some College
15         16           College Grad