
时间:2020-03-24 20:50:31

标签: python pandas seaborn


  1. 在三个实验条件下,已知仪器显示出不同的响应(test_1test_2test_3
  2. 使用两种不同品牌和型号的乐器(foo,bar)
  3. 在他们拥有的每种仪器模型的多个副本上重复其测量,这些副本具有单独的序列号,并且包括他们相互借来进行实验室间比较的仪器上的测量(在两个实验室测量的相同序列号)。


适应大熊猫cookbook examples on MultiIndexing的情况这是每个实验室交流数据的方式:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot

df = pd.DataFrame({'test': ['test_1', 'test_2' ,'test_3'],
                'foo_110': [1.1, 1.18, 1.19],
                'foo_112': [1.15, 1.25, 1.25],
                'bar_888': [1.11, 1.15, 1.16],
                'bar_657': [1.14, 1.16, 1.18]}
df1 = pd.DataFrame({'test': ['test_1', 'test_2' ,'test_3'],
                'foo_105': [1.13, 1.17, 1.18],
                'foo_112': [1.16, 1.26, 1.28],
                'foo_167': [1.18, 1.23, 1.27],
                'bar_888': [1.10, 1.14, 1.18],
                'bar_415': [1.12, 1.15, 1.16]}

为准备其Seaborn图的数据,数据帧将在其索引stacked()中进行重组,并沿着axis = 0进行连接:

df = df.set_index('test')
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
df = df.stack().reset_index()

df1 = df1.set_index('test')
df1.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df1.columns])
df1 = df1.stack().reset_index()

dfAll = pd.concat((df, df1), axis = 0, sort= False)
dfAll.columns = ['test', 's.no.', 'bar', 'foo']


     test s.no.   bar   foo
0  test_1   110   NaN  1.10
1  test_1   112   NaN  1.15
2  test_1   657  1.14   NaN
3  test_1   888  1.11   NaN
4  test_2   110   NaN  1.18
5  test_2   112   NaN  1.25
6  test_2   657  1.16   NaN
7  test_2   888  1.15   NaN
8  test_3   110   NaN  1.19
9  test_3   112   NaN  1.25

为“ bar”工具绘制所有数据:

dfAllplot = sns.catplot(x="test", y ="bar", data=dfAll, hue='s.no.')

Results for instrument type **bar**, Lab_1 and Lab_2 combined



df['Lab'] = 'Lab_1' 
df1['Lab'] = 'Lab_2'

Same as above, but with hue set to 'Lab'


df1 = df1.set_index('test')
df1.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df1.columns])
df1['urel'] = [0.015, 0.014, 0.013]


         foo               bar         urel
         105   112   167   888   415       
test_1  1.13  1.16  1.18  1.10  1.12  0.015
test_2  1.17  1.26  1.23  1.14  1.15  0.014
test_3  1.18  1.28  1.27  1.18  1.16  0.013


df1 = df1.stack().reset_index()
df1['Lab'] = 'Lab_2'


     test level_1   bar   foo   urel    Lab
0  test_1     105   NaN  1.13    NaN  Lab_2
1  test_1     112   NaN  1.16    NaN  Lab_2
2  test_1     167   NaN  1.18    NaN  Lab_2
3  test_1     415  1.12   NaN    NaN  Lab_2
4  test_1     888  1.10   NaN    NaN  Lab_2
5  test_1           NaN   NaN  0.015  Lab_2
6  test_2     105   NaN  1.17    NaN  Lab_2
7  test_2     112   NaN  1.26    NaN  Lab_2
8  test_2     167   NaN  1.23    NaN  Lab_2
9  test_2     415  1.15   NaN    NaN  Lab_2

何时'urel'应该添加到数据框中? 如果在MultiIndexing之前添加,即从一开始就在这里进行Multi-indexing,堆叠和重置,则'urel'会再次“中断”。 还是stack()不是此处提供的示例的正确方法?

2 个答案:

答案 0 :(得分:0)



df = pd.DataFrame({'test': ['test_1', 'test_2' ,'test_3'],
                   'foo_110': [1.1, 1.18, 1.19],
                   'foo_112': [1.15, 1.25, 1.25],
                   'bar_888': [1.11, 1.15, 1.16],
                   'bar_657': [1.14, 1.16, 1.18],
                   'urel' : [0.020, 0.025, 0.018],
                   'HVL' : [0.156, 0.180, 0.195]}
df = df.set_index('test')


def stack_for_seaborn(df, separate=['urel', 'HVL']):
    Alternative stacking() of a pandas dataframe for seaborn plotting
    First, some columns are extracted from the stacking() process
    Second, stacking() is applied
    Third, the 'separate' portion of the df is appended considering the new index
    idx = df.index
    addnl = df[separate]
    df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
    df.drop(separate, axis = 1, level=0, inplace=True)
    df = df.stack().reset_index()
    df = df.set_index('test')
    for t in idx:
        for c in addnl.columns:
            df.loc[t, c] = addnl[c].loc[t]
    return df

df = stack_for_seaborn(df)
df['Lab'] = 'Lab_1' 
df.columns =   ['test', 's.no.', 'bar', 'foo', 'urel', 'HVL', 'Lab'] 


     test s.no.   bar   foo   urel    HVL    Lab
0  test_1   110   NaN  1.10  0.020  0.156  Lab_1
1  test_1   112   NaN  1.15  0.020  0.156  Lab_1
2  test_1   657  1.14   NaN  0.020  0.156  Lab_1
3  test_1   888  1.11   NaN  0.020  0.156  Lab_1
4  test_2   110   NaN  1.18  0.025  0.180  Lab_1
5  test_2   112   NaN  1.25  0.025  0.180  Lab_1
6  test_2   657  1.16   NaN  0.025  0.180  Lab_1
7  test_2   888  1.15   NaN  0.025  0.180  Lab_1
8  test_3   110   NaN  1.19  0.018  0.195  Lab_1
9  test_3   112   NaN  1.25  0.018  0.195  Lab_1

答案 1 :(得分:0)


此解决方案对melt()使用DataFrames方法。 重新定义两个数据框,以便使用melt方法更容易处理列:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df_1 = pd.DataFrame({'urel' : [0.020, 0.025, 0.018],
                     'HVL' : [0.156, 0.180, 0.195],
                     'test': ['test_1', 'test_2' ,'test_3'],
                     'Lab' : ['Lab_1', 'Lab_1', 'Lab_1'],
                     'foo_110': [1.1, 1.18, 1.19],
                     'foo_112': [1.15, 1.25, 1.25],
                     'bar_657': [1.14, 1.16, 1.18],
                     'bar_888': [1.11, 1.15, 1.16],

df_2 = pd.DataFrame({'urel' : [0.020, 0.025, 0.018],
                     'HVL' : [0.156, 0.180, 0.195],
                     'test': ['test_1', 'test_2' ,'test_3'],
                     'Lab' : ['Lab_2', 'Lab_2', 'Lab_2'],
                     'foo_105': [1.13, 1.17, 1.18],
                     'foo_112': [1.16, 1.26, 1.28],
                     'foo_167': [1.18, 1.23, 1.27],
                     'bar_888': [1.10, 1.14, 1.18],
                     'bar_415': [1.12, 1.15, 1.16],
for df in (df_1, df_2):
    df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])

df_1 = df_1.melt(id_vars=(df_1.columns.tolist()[:4]),
                  var_name=['model', 'ser.no']
df_2 = df_2.melt(id_vars=(df_2.columns.tolist()[:4]),
                  var_name=['model', 'ser.no']

colnames = ['urel', 'HVL', 'test', 'Lab', 'model', 'ser.no', 'value'] 
df_1.columns = colnames
df_2.columns = colnames
dfAll = df_1.append(df_2, ignore_index=True)


    urel    HVL    test    Lab model ser.no  value
0  0.020  0.156  test_1  Lab_1   foo    110   1.10
1  0.025  0.180  test_2  Lab_1   foo    110   1.18
2  0.018  0.195  test_3  Lab_1   foo    110   1.19
3  0.020  0.156  test_1  Lab_1   foo    112   1.15
4  0.025  0.180  test_2  Lab_1   foo    112   1.25
5  0.018  0.195  test_3  Lab_1   foo    112   1.25
6  0.020  0.156  test_1  Lab_1   bar    657   1.14
7  0.025  0.180  test_2  Lab_1   bar    657   1.16
8  0.018  0.195  test_3  Lab_1   bar    657   1.18
9  0.020  0.156  test_1  Lab_1   bar    888   1.11



The image resulting from the catplot after using the melt() method
