造型的熊猫组合箱形图

时间:2013-10-18 16:00:08

标签: python matplotlib pandas

Python中的普通matplotlib boxplot命令返回一个字典,其中包含方框,中位数,胡须,传单和大写字母的键。这使得造型非常简单。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Create a dataframe and subset it for a boxplot
df1 = pd.DataFrame(rand(10), columns=['Col1'] )
df1['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])
boxes= [df1[df1['X'] == 'A'].Col1, df1[df1['X'] == 'B'].Col1]

# Call the standard matplotlib boxplot function,
# which returns a dictionary including the parts of the graph
mbp = plt.boxplot(boxes)
print(type(mbp))

# This dictionary output makes styling the boxplot easy
plt.setp(mbp['boxes'], color='blue')
plt.setp(mbp['medians'], color='red')
plt.setp(mbp['whiskers'], color='blue')
plt.setp(mbp['fliers'], color='blue')

Pandas库为其分组(分层索引)数据帧提供了“优化”boxplot函数。但是,它不返回每个组的几个字典,而是返回一个matplotlib.axes.AxesSubplot对象。这使得造型非常困难。

# Pandas has a built-in boxplot function that returns
# a matplotlib.axes.AxesSubplot object
pbp = df1.boxplot(by='X')
print(type(pbp))

# Similar attempts at styling obviously return TypeErrors
plt.setp(pbp['boxes'], color='blue')
plt.setp(pbp['medians'], color='red')
plt.setp(pbp['whiskers'], color='blue')
plt.setp(pbp['fliers'], color='blue')

这个由pandas df.boxplot(by ='X')生成的AxisSubplot对象是否可以访问?

2 个答案:

答案 0 :(得分:7)

您还可以将return_type指定为dict。这将直接在字典中返回boxplot属性,该字典由在boxplot中绘制的每列进行索引。

使用上面的示例(在IPython中):

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot( by='X', return_type='dict' )

>>> bp.keys()
['Col1', 'Col2']

>>> bp['Col1'].keys()
['boxes', 'fliers', 'medians', 'means', 'whiskers', 'caps']

现在,改变线宽是列表理解的问题:

>>> [ [item.set_linewidth( 2 ) for item in bp[key]['medians']] for key in bp.keys() ]
[[None, None], [None, None]]

答案 1 :(得分:2)

恐怕你必须硬编码。以pandas为例:http://pandas.pydata.org/pandas-docs/stable/visualization.html#box-plotting

from pandas import *
import matplotlib
from numpy.random import rand
import matplotlib.pyplot as plt
df = DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = Series(['A','A','A','A','A','B','B','B','B','B'])
bp = df.boxplot(by='X')
cl=bp[0].get_children()
cl=[item for item in cl if isinstance(item, matplotlib.lines.Line2D)]

现在让我们确定哪一个是方框,中位数等等:

for i, item in enumerate(cl):
    if item.get_xdata().mean()>0:
        bp[0].text(item.get_xdata().mean(), item.get_ydata().mean(), str(i), va='center', ha='center')

情节如下:

enter image description here

每个酒吧由8个项目组成。例如,第5项是中位数。第7和第8项可能是传单,我们在这里没有。

了解这些,修改栏的某些部分很容易。如果我们想将中位数设置为linewidth为2:

for i in range(_your_number_of_classes_2_in_this_case):
    cl[5+i*8].set_linewidth(2.)