我正在使用Pandas绘制一个包含三种类型列的DataFrame:兴趣,性别和经验值。
我想将Experience点分成特定范围,然后按照分箱值,兴趣和性别对DataFrame进行分组。然后,我想根据特定性别(例如:男性)绘制兴趣计数。
使用下面的代码,我能够得到我想要的情节,但是,Pandas错误地对x轴上的分档值进行排序(参见我所说的附图)。
请注意,当我打印DataFrame时,分箱值的顺序正确,但在图表中,分箱值的分类不正确。
Experience Points Interest Gender
(0, 8] Bike Female 9
Male 5
Hike Female 6
Male 10
Swim Female 7
Male 7
(8, 16] Bike Female 8
Male 3
Hike Female 4
Male 7
Swim Female 10
Male 4
(16, 24] Bike Female 4
Male 6
Hike Female 10
...
我的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random
matplotlib.style.use('ggplot')
interest = ['Swim','Bike','Hike']
gender = ['Male','Female']
experience_points = np.arange(0,200)
df = pd.DataFrame({'Interest':[random.choice(interest) for x in range(1000)],
'Gender':[random.choice(gender) for x in range(1000)],
'Experience Points':[random.choice(experience_points) for x in range(1000)]})
bins = np.arange(0,136,8)
exp_binned = pd.cut(df['Experience Points'],np.append(bins,df['Experience Points'].max()+1))
exp_distribution = df.groupby([exp_binned,'Interest','Gender']).size()
# Printed dataframe has correct sorting by binned values
print exp_distribution
#Plotted dataframe has incorrect sorting of binned values
exp_distribution.unstack(['Gender','Interest'])['Male'].plot(kind='bar')
plt.show()
疑难解答步骤:
使用plot(kind='bar',sort_columns=True)
无法解决问题
仅按分组值进行分组,然后绘制DOES来解决问题,但之后我无法按兴趣或性别进行分组。例如,以下工作:
exp_distribution = df.groupby([exp_binned]).size()
exp_distribution.plot(kind='bar')