在Index中透视具有重复值的数据帧

时间:2015-07-15 08:14:17

标签: python pandas

我有像这样的pandas数据框

    snapDate     instance   waitEvent                   AvgWaitInMs
0   2015-Jul-03  XX         gc cr block 3-way               1
1   2015-Jun-29  YY         gc current block 3-way          2
2   2015-Jul-03  YY         gc current block 3-way          1
3   2015-Jun-29  XX         gc current block 3-way          2
4   2015-Jul-01  XX         gc current block 3-way          2
5   2015-Jul-01  YY         gc current block 3-way          2
6   2015-Jul-03  XX         gc current block 3-way          2
7   2015-Jul-03  YY         log file sync                   9
8   2015-Jun-29  XX         log file sync                   8
9   2015-Jul-03  XX         log file sync                   8
10  2015-Jul-01  XX         log file sync                   8
11  2015-Jul-01  YY         log file sync                   9
12  2015-Jun-29  YY         log file sync                   8

我需要将其转换为

snapDate        instance    gc cr block 3-way    gc current block 3-way  log file sync  
2015-Jul-03       XX              1                      Na                  8
2015-Jun-29       YY              Na                     2                   8 
2015-Jul-03       YY              Na                     1                   9
...

我尝试过pivot,但是它返回了一个错误 dfWaits.pivot(index ='snapDate',columns ='waitEvent',values ='AvgWaitInMs') 索引包含重复的条目,无法重塑

结果应该是另一个dataFrame

1 个答案:

答案 0 :(得分:1)

这是将数据帧重塑为类似于您想要的内容的一种方法。如果您对结果数据框有任何其他具体要求,请与我们联系。

import pandas as pd

# your data
# ====================================
print(df)

       snapDate instance               waitEvent  AvgWaitInMs
0                                                            
0   2015-Jul-03       XX       gc cr block 3-way            1
1   2015-Jun-29       YY  gc current block 3-way            2
2   2015-Jul-03       YY  gc current block 3-way            1
3   2015-Jun-29       XX  gc current block 3-way            2
4   2015-Jul-01       XX  gc current block 3-way            2
5   2015-Jul-01       YY  gc current block 3-way            2
6   2015-Jul-03       XX  gc current block 3-way            2
7   2015-Jul-03       YY           log file sync            9
8   2015-Jun-29       XX           log file sync            8
9   2015-Jul-03       XX           log file sync            8
10  2015-Jul-01       XX           log file sync            8
11  2015-Jul-01       YY           log file sync            9
12  2015-Jun-29       YY           log file sync            8

# processing
# ====================================
df_temp = df.set_index(['snapDate', 'instance', 'waitEvent']).unstack().fillna(0)

df_temp.columns = df_temp.columns.get_level_values(1).values

df_temp = df_temp.reset_index('instance')

print(df_temp)

            instance  gc cr block 3-way  gc current block 3-way  log file sync
snapDate                                                                      
2015-Jul-01       XX                  0                       2              8
2015-Jul-01       YY                  0                       2              9
2015-Jul-03       XX                  1                       2              8
2015-Jul-03       YY                  0                       1              9
2015-Jun-29       XX                  0                       2              8
2015-Jun-29       YY                  0                       2              8