Python Pandas Dataframe Pivot,计算均值和中位数,前20名

时间:2018-04-07 03:23:06

标签: python pandas dataframe pivot

我正在尝试使用计算列从python在python中创建一个PIVOT表。我在DataFrame中运行了大量数据。

该文件包含很少的列,即客户alertkey和mttr

enter image description here

预期输出

enter image description here

预计输出将具有Customer和AlertKey(仅限前5名)明智的groupby。然后针对每个alertKey其相应的Mean和Median MTTR

通过从多个数据库表中提取数据来创建数据帧。现在我被困在如何进行表示。这在excel中不能轻易完成,因为我们需要从多个数据库中提取记录+在Excel中进行中值计算Pivot是一种痛苦。此过程也需要自动化。

df.groupby(['Customer','AlertKey']).AlertKey.value_counts().nlargest(20)

The Excel File with Sample Data

1 个答案:

答案 0 :(得分:0)

首先考虑为值计数排名创建指标,然后运行groupby().agg()调用:

df['AKCount'] = df.groupby(['Customer'])['AlertKey'].transform('value_counts')
df['AKCountRank'] = df.groupby(['Customer'])['AKCount'].transform(lambda x: x.rank(method='max'))

sub = df[df['AKCountRank'] <= 20]

gdf = sub.groupby(['Customer', 'AlertKey'])['MTTR'].agg(['count','mean','median'])

数据

from io import StringIO
import pandas as pd

txt ="""
Customer AlertKey MTTR
C1 C1A1 38
C1 C1A2 25
C2 C2A5 40
C1 C1A1 50
C3 C3A7 60
C3 C3A7 23
C5 C5A8 29
C3 C3A7 30
C5 C5A8 40
"""

df = pd.read_table(StringIO(txt), sep="\s+")

<强>输出

print(gdf)

#                    MTTR                  
#                   count       mean median
# Customer AlertKey                        
# C1       C1A1         2  44.000000   44.0
#          C1A2         1  25.000000   25.0
# C2       C2A5         1  40.000000   40.0
# C3       C3A7         3  37.666667   30.0
# C5       C5A8         2  34.500000   34.5