如何用熊猫绘制两种箱形图?

时间:2020-06-25 16:41:08

标签: python pandas matplotlib

你好,我正在用熊猫绘制两个不同的箱形图:

<taskdef>

我得到以下两个情节:

enter image description here

enter image description here

问题是,是否可以在同一图上绘制6个箱形图,并对每个图的三个箱形图使用不同的颜色?

谢谢

1 个答案:

答案 0 :(得分:1)

  • 最简单的方法是将数据从宽格式转换为长格式,然后使用hue参数进行seaborn绘制。
  • pandas.wide_to_long
    • 必须有一个唯一的ID,因此要添加id列。
    • 要转换的列必须具有相似的stubnames,这就是为什么我将error移到列名的前面的原因。
      • 错误列名称将在一个列中,值将在单独的列中

导入和测试数据

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# setup data and dataframe
np.random.seed(365)
data = {'mod_lg': np.random.normal(0.3, .1, size=(30,)),
        'mod_rf': np.random.normal(0.05, .01, size=(30,)),
        'mod_bg': np.random.normal(0.02, 0.002, size=(30,)),
        'mean_train_score': np.random.normal(0.95, 0.3, size=(30,)),
        'mean_test_score': np.random.normal(0.86, 0.5, size=(30,))}

df = pd.DataFrame(data)
df['error_mean_test_score'] = [1] - df['mean_test_score']
df['error_mean_train_score'] = [1] - df['mean_train_score']
df["id"] = df.index

df = pd.wide_to_long(df, stubnames='mod', i='id', j='mode', sep='_', suffix='\D+').reset_index()
df["id"] = df.index

# display dataframe: this is probably what your dataframe looks like to generate your current plots
   id mode  mean_train_score  error_mean_test_score  mean_test_score  error_mean_train_score       mod
0   0   lg          0.663855              -0.343961         1.343961                0.336145  0.316792
1   1   lg          0.990114               0.472847         0.527153                0.009886  0.352351
2   2   lg          1.179775               0.324748         0.675252               -0.179775  0.381738
3   3   lg          0.693155               0.519526         0.480474                0.306845  0.470385
4   4   lg          1.191048              -0.128033         1.128033               -0.191048  0.085305

变换和绘制

  • error_score_name列包含error_mean_test_scoreerror_mean_train_score
  • 的后缀
  • error_score_value列包含值。
# convert df error columns to long format
dfl = pd.wide_to_long(df, stubnames='error', i='id', j='score', sep='_', suffix='\D+').reset_index(level=1)
dfl.rename(columns={'score': 'error_score_name', 'error': 'error_score_value'}, inplace=True)

# display dfl

   error_score_name  mean_train_score       mod  mean_test_score mode  error_score_value
id                                                                                      
0   mean_test_score          0.663855  0.316792         1.343961   lg          -0.343961
1   mean_test_score          0.990114  0.352351         0.527153   lg           0.472847
2   mean_test_score          1.179775  0.381738         0.675252   lg           0.324748
3   mean_test_score          0.693155  0.470385         0.480474   lg           0.519526
4   mean_test_score          1.191048  0.085305         1.128033   lg          -0.128033

# plot dfl
sns.boxplot(x='mode', y='error_score_value', data=dfl, hue='error_score_name')

enter image description here