df = pd.DataFrame(['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D'],
index=['excellent', 'excellent', 'excellent', 'good', 'good', 'good', 'ok', 'ok', 'ok', 'poor', 'poor'])
df.rename(columns={0: 'Grades'}, inplace=True)
cat_dtype = pd.CategoricalDtype(categories=['D', 'D+', 'C-', 'C', 'C+', 'B-', 'B', 'B+', 'A-', 'A', 'A+'], ordered=True)
print(df['Grades'].astype(cat_dtype))
print(df['Grades'] > 'C')
当我检查cat_dtype对象时,当“ A +”最大而“ D”最小时,顺序显然是正确的。当我将“等级”列下的值与“ C”进行比较时,结果与顺序不符。有什么作用?
答案 0 :(得分:2)
看起来您只需要将'astype'分配给df ['Grades']
df['Grades'] = df['Grades'].astype(cat_dtype)
对您的标准输出进行过滤的方式为:
df[df['Grades'] > 'C']
Grades
excellent A+
excellent A
excellent A-
good B+
good B
good B-
ok C+