Pandas绘图错误地对图表上的分箱值进行排序

时间:2015-08-23 04:06:48

标签: python pandas matplotlib plot

我正在使用Pandas绘制一个包含三种类型列的DataFrame:兴趣,性别和经验值。

我想将Experience点分成特定范围,然后按照分箱值,兴趣和性别对DataFrame进行分组。然后,我想根据特定性别(例如:男性)绘制兴趣计数。

使用下面的代码,我能够得到我想要的情节,但是,Pandas错误地对x轴上的分档值进行排序(参见我所说的附图)。

enter image description here

请注意,当我打印DataFrame时,分箱值的顺序正确,但在图表中,分箱值的分类不正确。

Experience Points  Interest  Gender
(0, 8]             Bike      Female     9
                             Male       5
                   Hike      Female     6
                             Male      10
                   Swim      Female     7
                             Male       7
(8, 16]            Bike      Female     8
                             Male       3
                   Hike      Female     4
                             Male       7
                   Swim      Female    10
                             Male       4
(16, 24]           Bike      Female     4
                             Male       6
                   Hike      Female    10
...

我的代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

matplotlib.style.use('ggplot')


interest = ['Swim','Bike','Hike']
gender = ['Male','Female']
experience_points = np.arange(0,200)

df = pd.DataFrame({'Interest':[random.choice(interest) for x in range(1000)],
                   'Gender':[random.choice(gender) for x in range(1000)],
                   'Experience Points':[random.choice(experience_points) for x in range(1000)]})

bins = np.arange(0,136,8)
exp_binned = pd.cut(df['Experience Points'],np.append(bins,df['Experience Points'].max()+1))

exp_distribution = df.groupby([exp_binned,'Interest','Gender']).size()

# Printed dataframe has correct sorting by binned values 
print exp_distribution 

#Plotted dataframe has incorrect sorting of binned values 
exp_distribution.unstack(['Gender','Interest'])['Male'].plot(kind='bar') 

plt.show()

疑难解答步骤:

使用plot(kind='bar',sort_columns=True)无法解决问题

仅按分组值进行分组,然后绘制DOES来解决问题,但之后我无法按兴趣或性别进行分组。例如,以下工作:

exp_distribution = df.groupby([exp_binned]).size()
exp_distribution.plot(kind='bar') 

1 个答案:

答案 0 :(得分:3)

unstack()弄乱了订单,必须恢复索引顺序。您可能想要为此提交错误报告。

解决方法:

exp_distrubtion.unstack(['Gender','Interest']).ix[exp_distrubtion.index.get_level_values(0).unique(),
                                                  'Male'].plot(kind='bar') 

enter image description here