我有一个带有“教育”属性的数据框。值是离散的1-16。出于交叉制表的目的,我想将此“教育”变量进行分类,但要使用自定义分类(1:8、9:11、12、13:15、16)。
我一直在鬼混pd.cut(),但收到无效的语法错误
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'], bins=[1:8, 9, 10:11, 12, 13:15, 16], labels = ['Middle School or less', 'Some High School', 'High School Grad', 'Some College', 'College Grad'])
答案 0 :(得分:1)
尝试使垃圾箱落在阈值之间:
bins = [0.5, 8.5, 11.5, 12.5, 15.5, 16.5]
labels=['Middle School or less', 'Some High School',
'High School Grad', 'Some College', 'College Grad']
adult_df_educrace['education_bins'] = pd.cut(x=adult_df_educrace['education'],
bins=bins,
labels=labels)
测试:
adult_df_educrace = pd.DataFrame({'education':np.arange(1,17)})
输出:
education education_bins
0 1 Middle School or less
1 2 Middle School or less
2 3 Middle School or less
3 4 Middle School or less
4 5 Middle School or less
5 6 Middle School or less
6 7 Middle School or less
7 8 Middle School or less
8 9 Some High School
9 10 Some High School
10 11 Some High School
11 12 High School Grad
12 13 Some College
13 14 Some College
14 15 Some College
15 16 College Grad