如何使用列表对多索引数据透视表的列标题进行排序

时间:2019-05-17 20:34:31

标签: python pandas

我正在尝试根据包含我喜欢的排序方式的列表对数据透视表列进行排序。下面的示例:

df = pd.DataFrame({'Name':['name1', 'name2', 'name1', 'name2', 'name2','name2'], 
                   'Block':['Block 1','Block 1', 'Block 10','Block 2','Block 2','Block 2'], 
                   'Week':['wk1','wk2','wk42','wk11','wk9','wk8'],
                   'Date':['01/15/2020','01/20/2020','11/29/2020','05/01/2020','04/20/2020','04/15/2020'],
                   'Events':['SIR','','','RSNA', '','CORE'],
                   'Rotation':['ABD','MAM','ER','UMH','PEDI','VIR']
                  })


df_summary = df.pivot_table(index=['Rotation'], columns=['Block','Week','Date','Events'], values='Name', aggfunc="count").fillna(0).astype(int)

它将生成以下数据透视表

pivot

数据透视表的列不在首选顺序中。我想对列表进行排序:

blocks = ['Block 1','Block 2','Block 10']
weeks = ['wk1','wk2','wk8','wk9','wk11','wk42']
dates = ['01/15/2020','01/20/2020','04/15/2020','04/20/2020','05/01/2020','11/29/2020']

所以我尝试了.reindex(请参见下文),但始终出现错误- TypeError:预期的元组,得到了str

df_summary = df_summary.reindex(columns=blocks)

df_summary = df_summary.reindex(columns=blocks,weeks,dates)

可以使用带有列表的重新索引吗?我应该尝试用词典重新编制索引吗?任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:2)

简单地将pd.crosstabnatsorted一起使用

from natsort import natsorted
df.Block=pd.Categorical(df.Block,categories=natsorted(df.Block.unique()),ordered=True)
s=pd.crosstab(df.Rotation,[df.Block,df.Week,df.Date,df.Events]).sort_index(level=0,axis=1)
s
Out[305]: 
Block       Block 1               Block 2                         Block 10
Week            wk1        wk2       wk11        wk8        wk9       wk42
Date     01/15/2020 01/20/2020 05/01/2020 04/15/2020 04/20/2020 11/29/2020
Events          SIR                  RSNA       CORE                      
Rotation                                                                  
ABD               1          0          0          0          0          0
ER                0          0          0          0          0          1
MAM               0          1          0          0          0          0
PEDI              0          0          0          0          1          0
UMH               0          0          1          0          0          0
VIR               0          0          0          1          0          0

答案 1 :(得分:0)

此解决方案从您的指定列表中创建一个MultiIndex对象,然后将其用作DataFrame.reindex()的参数。还必须考虑事件,因为它们是原始列索引的一部分。

blocks = 2*['Block 1'] + 3*['Block 2'] + ['Block 10']
weeks = ['wk1','wk2','wk8','wk9','wk11','wk42']
dates = ['01/15/2020','01/20/2020','04/15/2020','04/20/2020','05/01/2020','11/29/2020']
events = ['SIR','','CORE', '', 'RSNA', '']

midx = pd.MultiIndex.from_arrays(
   arrays=[blocks, weeks, dates, events], 
   names=['Block', 'Week', 'Date', 'Event']
)

df_summary.reindex(columns=midx)

# returns the following:
Block       Block 1               Block 2                         Block 10
Week            wk1        wk2        wk8        wk9       wk11       wk42
Date     01/15/2020 01/20/2020 04/15/2020 04/20/2020 05/01/2020 11/29/2020
Event           SIR                  CORE                  RSNA           
Rotation                                                                  
ABD               1          0          0          0          0          0
ER                0          0          0          0          0          1
MAM               0          1          0          0          0          0
PEDI              0          0          0          1          0          0
UMH               0          0          0          0          1          0
VIR               0          0          1          0          0          0