在熊猫和海洋中分组的箱形图

时间:2016-02-29 01:08:49

标签: python pandas boxplot seaborn

我有这个人。数据帧:

     season           A         B         C         D
0   current   26.978912  0.039233  1.248607  0.025874
1   current   26.978912  0.039233  0.836786  0.025874
2   current   26.978912  0.039233  3.047536  0.025874
3   current   26.978912  0.039233  3.726964  0.025874
4   current   26.978912  0.039233  1.171393  0.025874
5   current   26.978912  0.039233  0.180929  0.025874
6   current   26.978912  0.039233  0.000000  0.025874
7   current   34.709560  0.039233  0.700893  0.025874
8   current  111.140200  0.306142  3.068286  0.169244
9   current  111.140200  0.306142  2.931107  0.169244
10  current  111.140200  0.306142  2.121893  0.169244
11  current  111.140200  0.306142  1.479464  0.169244
12  current  111.140200  0.306142  2.186821  0.169244
13  current  111.140200  0.306142  9.542714  0.169244
14  current  111.140200  0.306142  9.890750  0.169244
15  current  111.140200  0.306142  8.864857  0.169244
16     past   88.176415  0.257901  3.416059  0.141809
17     past   88.176415  0.257901  4.835357  0.141809
18     past   88.176415  0.257901  5.238097  0.141809
19     past   88.176415  0.257901  5.535355  0.141809
20     past   88.176415  0.257901  6.479523  0.141809
21     past   88.176415  0.257901  7.727862  0.141809
22     past   88.176415  0.257901  8.046811  0.141809
23     past   94.037913  0.308439  8.541000  0.163651
24     past  101.630141  0.363136  8.416895  0.192256
25     past  101.630141  0.363136  6.531005  0.192256
26     past  101.630141  0.363136  6.397497  0.192256
27     past  101.630141  0.363136  6.500077  0.192256
28     past  101.630141  0.363136  7.088469  0.192256
29     past  101.630141  0.363136  7.821852  0.192256
30     past  101.630141  0.363136  8.011082  0.192256
31     past  101.037817  0.417099  8.279735  0.212376
32     past   88.176415  0.257901  3.416059  0.141809
33     past   88.176415  0.257901  4.835357  0.141809
34     past   88.176415  0.257901  5.238097  0.141809
35     past   88.176415  0.257901  5.535355  0.141809
36     past   88.176415  0.257901  6.479523  0.141809
37     past   88.176415  0.257901  7.727862  0.141809
38     past   88.176415  0.257901  8.046811  0.141809
39     past   94.037913  0.308439  8.541000  0.163651
40     past  101.630141  0.363136  8.416895  0.192256
41     past  101.630141  0.363136  6.531005  0.192256
42     past  101.630141  0.363136  6.397497  0.192256
43     past  101.630141  0.363136  6.500077  0.192256
44     past  101.630141  0.363136  7.088469  0.192256
45     past  101.630141  0.363136  7.821852  0.192256
46     past  101.630141  0.363136  8.011082  0.192256
47     past  101.037817  0.417099  8.279735  0.212376

我按照这样绘制:

df.boxplot(by='season')

enter image description here

如何确保不同的面板具有不同的y轴最小值和最大值?另外,我怎么能在seaborn中做到这一点?

1 个答案:

答案 0 :(得分:2)

好的,所以你需要的第一件事就是长篇数据。我们假设你从这开始:

import numpy
import pandas
import seaborn
numpy.random.seed(0)

N = 100
seasons = ['winter', 'spring', 'summer', 'autumn']
df = pandas.DataFrame({
    'season': numpy.random.choice(seasons, size=N),
    'A': numpy.random.normal(4, 1.75, size=N),
    'B': numpy.random.normal(4, 4.5, size=N),
    'C': numpy.random.lognormal(0.5, 0.05, size=N),
    'D': numpy.random.beta(3, 1, size=N)
})

print(df.sample(7))

           A         B         C         D  season
85  7.236212  5.044815  1.845659  0.550943  autumn
13  4.749581  1.014348  1.707000  0.630618  autumn
0   1.014027  4.750031  1.637803  0.285781  winter
3   3.233370  8.250158  1.516189  0.973797  winter
44  6.062864 -0.969725  1.564768  0.954225  autumn
43  7.317806 -3.209259  1.699684  0.968950  spring
39  5.576446 -2.187281  1.735002  0.436692  winter

您可以使用pandas.melt函数将其转换为长格式数据。

lf = pandas.melt(df, value_vars=['A', 'B', 'C', 'D'], id_vars='season')
print(lf.sample(7))

     season variable     value
399  winter        D  0.238061
227  spring        C  1.656770
322  autumn        D  0.933299
121  autumn        B  4.393981
6    autumn        A  1.175679
5    autumn        A  5.360608
51   spring        A  5.709118

然后你可以将所有内容直接输入seaborn.factorplot

fg = (
    pandas.melt(df, value_vars=['A', 'B', 'C', 'D'], id_vars='season')
        .pipe(
            (seaborn.factorplot, 'data'), # (<fxn>, <dataframe var>)
            kind='box',                   # type of plot we want
            x='season', x_order=seasons,  # x-values of the plots
            y='value', palette='BrBG_r',  # y-values and colors
            col='variable', col_wrap=2,   # 'A-D' in columns, wrap at 2nd col
            sharey=False                  # tailor y-axes for each group
            notch=True, width=0.75,       # kwargs passed to boxplot
        )
)

这让我:

enter image description here