Pandas,避免在数据透视表中的层次结构

时间:2016-04-24 06:46:45

标签: python pandas

我有一个pandas数据框df,使用以下函数从中生成数据透视表;

def objective2(excel_file):
    df = pd.read_excel(excel_file)

    # WBC cut-offs
    df['WBC_groups'] = pd.cut(df.WBC, [0, 4, 12, 100], 
                             labels=['WBC < 4', 'WBC Normal', 'WBC > 12'])

    df['count'] = 1

    table = df.pivot_table('count', index=['Sex'],
                           columns=['WBC_groups', 'Outcome_at_24'],
                           aggfunc='sum',
                           margins=True, margins_name='Total')

    return table

这将生成下表:

WBC_groups         WBC < 4      WBC Normal      WBC > 12      Total
Outcome_at_24   Alive Died      Alive Died    Alive Died       
Sex                                                            
Female           10.0  2.0       20.0  6.0     14.0  NaN       86.0
Male              3.0  NaN       28.0  3.0     26.0  4.0      111.0
Total            13.0  2.0       48.0  9.0     40.0  4.0      197.0

如何避免列中的层次结构,以使表格如下所示:

WBC_groups       WBC < 4    WBC Normal   WBC > 12   Alive   Died  Total      
Sex                                                            
Female           10.0          2.0       20.0       6.0     14.0  86.0
Male              3.0          NaN       28.0       3.0     26.0  111.0
Total            13.0          2.0       48.0       9.0     40.0  197.0

注意:表格中的数据不准确,只是假人。

1 个答案:

答案 0 :(得分:2)

我认为您无法避免层次结构,因为在pivot_table中使用包含两列的参数列 - WBC_groupsOutcome_at_24

最简单的解决方案是设置新的列名,然后droprem

df.columns = ['WBC < 4', 'WBC Normal', 'WBC > 12', 'Alive', 'Died', 'rem', 'Total']
df = df.drop('rem', axis=1)
print df
        WBC < 4  WBC Normal  WBC > 12  Alive  Died  Total
Sex                                                      
Female     10.0         2.0      20.0    6.0  14.0   86.0
Male        3.0         NaN      28.0    3.0  26.0  111.0
Total      13.0         2.0      48.0    9.0  40.0  197.0

但是如果你需要更一般的解决方案:

print df
WBC_groups    WBC < 4      WBC Normal      WBC > 12       Total
Outcome_at_24   Alive Died      Alive Died    Alive Died       
Sex                                                            
Female           10.0  2.0       20.0  6.0     14.0  NaN   86.0
Male              3.0  NaN       28.0  3.0     26.0  4.0  111.0
Total            13.0  2.0       48.0  9.0     40.0  4.0  197.0

cols1 = df.columns.get_level_values('WBC_groups').to_series().drop_duplicates().tolist()
print cols1
['WBC < 4', 'WBC Normal', 'WBC > 12', 'Total']

cols2 = df.columns.get_level_values('Outcome_at_24').to_series().drop_duplicates().tolist()
print cols2
['Alive', 'Died', ' ']

cols = cols1[:-1] + cols2[:2] + ['rem'] + cols1[-1:]
print cols
['WBC < 4', 'WBC Normal', 'WBC > 12', 'Alive', 'Died', 'rem', 'Total']

df.columns = cols

df = df.drop('rem', axis=1)
print df
        WBC < 4  WBC Normal  WBC > 12  Alive  Died  Total
Sex                                                      
Female     10.0         2.0      20.0    6.0  14.0   86.0
Male        3.0         NaN      28.0    3.0  26.0  111.0
Total      13.0         2.0      48.0    9.0  40.0  197.0