长话短说。如何从pandas中的类别列绘制分组的boxplot,并仅显示子集中的当前类别而不是所有可能的类别。
[可重现的例子]
我有一个带有factor列的pandas数据帧,我想绘制一个boxplot。如果我按因子绘图就可以了。如果我执行子集并按因子绘制箱线图,也可以,并且仅绘制子集中存在的因子。但是,如果我将列设置为类别,那么即使它们不存在,所有类别也会在框图中绘制。
import pandas as pd
import numpy as np
x = ['A']*150 + ['B']*150 + ['C']*150 + ['D']*150 + ['E']*150 + ['F']*150
y = np.random.randn(900)
z = ['X']*450 + ['Y']*450
df = pd.DataFrame({'Letter':x, 'N':y, 'type':z})
print(df.head())
print(df.tail())
df.boxplot(by='Letter')
df[df['type']=='X'].boxplot(by='Letter')
df['Letter2'] = df['Letter'].copy()
df['Letter2'] = df['Letter2'].astype('category')
# set a category in order to sort the factor in specific order
df['Letter2'].cat.set_categories(df['Letter2'].drop_duplicates().tolist()[::-1], inplace=True)
df[df['type']=='X'].boxplot(by='Letter2')