如何对行进行分组并提取平均值

时间:2017-01-30 10:11:34

标签: python pandas

我有以下数据:

df =
    QUEUE_1   QUEUE_2   QUEUE_3   HOUR   TOTAL_SERVICE_TIME  TOTAL_WAIT_TIME
    ABC123    DEF656              7      20                  30
    ABC123                        7      22                  32
    DEF656    ABC123    FED456    8      15                  12
    FED456    DEF656              8      15                  16

我想计算每种类型的TOTAL_SERVICE_TIMETOTAL_WAIT_TIMEQUEUEABC123)的平均时间DEF656FED456

结果应该是这个:

result =
    QUEUE    HOUR   AVG_TOT_SERVICE_TIME   AVG_TOT_WAIT_TIME
    ABC123   7      21                     31
    ABC123   8      15                     12
    DEF656   7      20                     30
    DEF656   8      15                     14
    FED456   7      0                      0
    FED456   8      15                     14

这是我目前的代码,但它似乎没有给出预期的结果。特别是,HOUR的值未排序,TOTAL_SERVICE_TIMETOTAL_WAIT_TIME的平均值未正确计算。

cols = ['QUEUE', 'HOUR', 'TOTAL_SERVICE_TIME', 'TOTAL_WAIT_TIME']
result = pd.melt(
    df, ['HOUR', 'TOTAL_SERVICE_TIME', 'TOTAL_WAIT_TIME'],
    ['QUEUE_1', 'QUEUE_2', 'QUEUE_3'],
    value_name='QUEUE')[cols] 

2 个答案:

答案 0 :(得分:2)

我认为您需要先meltlreshape重塑您的数据:

result = pd.lreshape(df, {'QUEUE': ['QUEUE_1','QUEUE_2','QUEUE_3']})
print (result)
   HOUR  TOTAL_SERVICE_TIME  TOTAL_WAIT_TIME   QUEUE
0     7                  20               30  ABC123
1     7                  22               32  ABC123
2     8                  15               12  DEF656
3     8                  15               16  FED456
4     7                  20               30  DEF656
5     8                  15               12  ABC123
6     8                  15               16  DEF656
7     8                  15               12  FED456

然后groupby mean MultiIndexunique QUEUE HOURmux = pd.MultiIndex.from_product([result.QUEUE.dropna().unique(), result.dropna().HOUR.unique()], names=['QUEUE','HOUR']) print (result.groupby(['QUEUE','HOUR']) .mean() .reindex(mux, fill_value=0) .add_prefix('AVG_') .reset_index()) QUEUE HOUR AVG_TOTAL_SERVICE_TIME AVG_TOTAL_WAIT_TIME 0 ABC123 7 21 31 1 ABC123 8 15 12 2 DEF656 7 20 30 3 DEF656 8 15 14 4 FED456 7 0 0 5 FED456 8 15 14 A

B

答案 1 :(得分:1)

<强> 步骤:

1)使用$scope.getBaseTarif = function () { var baseTarif = 0; if (data.pickedOptions.variantA === true && data.pickedOptions.sumInsured === 30000) { for (var i = 0; i < rates.variantA.sumInsuredThirty.lenght; i++) { if (data.pickedOptions.days >= rates.variantA.sumInsuredThirty[ i ].dayFrom && data.pickedOptions.days <= rates.variantA.sumInsuredThirty[ i ].dayTo) { baseTarif = rates.variantA.sumInsuredThirty[ i ].tarif; return baseTarif; } } } }; pd.lreshape从宽格式转换为长格式,以 QUEUE_X 开头的列名称,并将健康列命名为 QUEUE

2)默认情况下,使用DF使用DF作为聚合功能,转移pivot_table np.mean。可选择使用0填充缺失值。

3)堆叠获得的DF,以便强制列作为索引,从而产生多索引格式。添加一个字符前缀并重置它的索引。

df = pd.lreshape(df, {'QUEUE': df.columns[df.columns.str.startswith('QUEUE')].tolist()})
piv_df = df.pivot_table(index=['QUEUE'], columns=['HOUR'], fill_value=0)
piv_df.stack().add_prefix('AVG_').reset_index()

enter image description here