Question

我有一个包含 50 多个特征的数据集，这些特征对应于腿部康复期间的特定运动。我将使用我们的康复设备的组与不使用它进行恢复的组进行了比较。该组包括具有 3 个诊断的患者，我想比较每个诊断之前（红色箱线图）和之后（蓝色箱线图）的箱线图。这是我使用的代码片段和我得到的输出。

对照组数据：

dataKONTR 
           Row        DG  DKK  ...  LOS_DCL_LB  LOS_DCL_L  LOS_DCL_LF
0    Williams1    distorze  0.0  ...          63         57          78
1    Williams2    distorze  0.0  ...          91         68          67
2    Norton1           LCA  1.0  ...          58         90          64
3    Norton2           LCA  1.0  ...          29         91          87
4    Chavender1   distorze  1.0  ...          61         56          75
5    Chavender2   distorze  1.0  ...          54         74          80
6    Bendis1      distorze  1.0  ...          32         57          97
7    Bendis2      distorze  1.0  ...          55         69          79
8    Shawn1             AS  1.0  ...          15         74          75
9    Shawn2             AS  1.0  ...          67         86          79
10   Cichy1            LCA  0.0  ...          45         83          80

这是我使用的代码段和我得到的输出。

temp = "c:/Users/novos/ŠKOLA/Statistika/data Mariana/%s.xlsx"

dataKU = pd.read_excel(temp % "VestlabEXP_KU", engine = "openpyxl", skipfooter= 85)     # patients using our rehabilitation tool
dataKONTR = pd.read_excel(temp % "VestlabEXP_kontr", engine = "openpyxl", skipfooter=51)    # control group

dataKU_diag = dataKU.dropna()
dataKONTR_diag = dataKONTR.dropna()


dataKUBefore = dataKU_diag[dataKU_diag['Row'].str.contains("1")]        # Patients data ending with 1 are before rehab
dataKUAfter = dataKU_diag[dataKU_diag['Row'].str.contains("2")]         # Patients data ending with 2 are before rehab

dataKONTRBefore = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("1")]  
dataKONTRAfter = dataKONTR_diagL[dataKONTR_diag['Row'].str.contains("2")]

b1 = dataKUBefore.boxplot(column=list(dataKUBefore.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='r', whiskers='r', medians='r', caps='r'),layout=(2,4),return_type='axes')


plt.ylim(0.5, 1.5)
plt.suptitle("")
plt.suptitle("Before, KU")

b2 = dataKUAfter.boxplot(column=list(dataKUAfter.filter(regex='LOS_RT')), by="DG", rot = 45, color=dict(boxes='b', whiskers='b', medians='b', caps='b'),layout=(2,4),return_type='axes')
# dataKUPredP
plt.suptitle("")
plt.suptitle("After, KU")
plt.ylim(0.5, 1.5)
plt.show()

输出是两个数字（红色箱线图是所有“康复前”数据，蓝色箱线图是所有“康复后”数据）

你能帮我制作每个诊断的红色和蓝色箱线图吗？

感谢您的帮助。

Answer 1

您可以尝试将数据连接成单个数据帧：

dataKUPlot = pd.concat({
    'Before': dataKUBefore,
    'After': dataKUAfter,
}, names=['When'])

您应该会在输出中看到名为 When 的附加索引级别。使用您发布的示例数据，它看起来像这样：

>>> pd.concat({'Before': df, 'After': df}, names=['When'])
                 Row        DG  DKK  ...  LOS_DCL_LB  LOS_DCL_L  LOS_DCL_LF
When                                                                       
Before 0   Williams1  distorze  0.0  ...          63         57          78
       1   Williams2  distorze  0.0  ...          91         68          67
       2     Norton1       LCA  1.0  ...          58         90          64
       3     Norton2       LCA  1.0  ...          29         91          87
       4  Chavender1  distorze  1.0  ...          61         56          75
After  0   Williams1  distorze  0.0  ...          63         57          78
       1   Williams2  distorze  0.0  ...          91         68          67
       2     Norton1       LCA  1.0  ...          58         90          64
       3     Norton2       LCA  1.0  ...          29         91          87
       4  Chavender1  distorze  1.0  ...          61         56          75

然后，您可以通过修改 by 石斑鱼，使用单个命令绘制所有框，从而绘制相同的图：

dataKUAfter.boxplot(column=dataKUPlot.filter(regex='LOS_RT').columns.to_list(), by=['DG', 'When'], rot = 45, layout=(2,4), return_type='axes')

我相信这是唯一的“简单”方式，恐怕看起来有点混乱：

任何其他方式都意味着使用 matplotlib 进行手动绘图 - 从而更好地控制。例如迭代所有需要的列：

fig, axes = plt.subplots(nrows=2, ncols=3, sharey=True, sharex=True)
pos = 1 + np.arange(max(dataKUBefore['DG'].nunique(), dataKUAfter['DG'].nunique()))
redboxes = {f'{x}props': dict(color='r') for x in ['box', 'whisker', 'median', 'cap']}
blueboxes = {f'{x}props': dict(color='b') for x in ['box', 'whisker', 'median', 'cap']}

ax_it = axes.flat
for colname, ax in zip(dataKUBefore.filter(regex='LOS_RT').columns, ax_it):
    # Making a dataframe here to ensure the same ordering
    show = pd.DataFrame({
        'before': dataKUBefore[colname].groupby(dataKUBefore['DG']).agg(list),
        'after': dataKUAfter[colname].groupby(dataKUAfter['DG']).agg(list),
    })

    ax.boxplot(show['before'].values, positions=pos - .15, **redboxes)
    ax.boxplot(show['after'].values, positions=pos + .15, **blueboxes)

    ax.set_xticks(pos)
    ax.set_xticklabels(show.index, rotation=45) 
    ax.set_title(colname)
    ax.grid(axis='both')

# Hide remaining axes:
for ax in ax_it:
    ax.axis('off')

Answer 2

您可以添加一个新列来分隔“之前”和“之后”。 Seaborn 的 boxplot 可以将该新列用作 hue。 sns.catplot(kind='box', ...) 创建一个由 boxplot 组成的网格：

import seaborn as sns
import pandas as pd
import numpy as np

names = ['Adams', 'Arthur', 'Buchanan', 'Buren', 'Bush', 'Carter', 'Cleveland', 'Clinton', 'Coolidge', 'Eisenhower', 'Fillmore', 'Ford', 'Garfield', 'Grant', 'Harding', 'Harrison', 'Hayes', 'Hoover', 'Jackson', 'Jefferson', 'Johnson', 'Kennedy', 'Lincoln', 'Madison', 'McKinley', 'Monroe', 'Nixon', 'Obama', 'Pierce', 'Polk', 'Reagan', 'Roosevelt', 'Taft', 'Taylor', 'Truman', 'Trump', 'Tyler', 'Washington', 'Wilson']
rows = np.array([(name + '1', name + '2') for name in names]).flatten()
dataKONTR = pd.DataFrame({'Row': rows,
                          'DG': np.random.choice(['AS', 'Distorze', 'LCA'], len(rows)),
                          'LOS_RT_A': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_B': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_C': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_D': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_E': np.random.randint(15, 100, len(rows)),
                          'LOS_RT_F': np.random.randint(15, 100, len(rows))})
dataKONTR = dataKONTR.dropna()
dataKONTR['When'] = ['Before' if r[-1] == '1' else 'After' for r in dataKONTR['Row']]
cols = [c for c in dataKONTR.columns if 'LOS_RT' in c]

df_long = dataKONTR.melt(value_vars=cols, var_name='Which', value_name='Value', id_vars=['When', 'DG'])
g = sns.catplot(kind='box', data=df_long, x='DG', col='Which', col_wrap=3, y='Value', hue='When')
g.set_axis_labels('', '') # remove the x and y labels

如何嵌套箱线图 groupBy

2 个答案: