如何按大熊猫的百分比汇总数据

时间:2016-02-16 11:32:34

标签: pandas group-by aggregate pivot-table crosstab

此代码:

  #Missing analysis for actions - which action is missing the most action_types?
    grouped_missing_analysis = pd.crosstab(clean_sessions.action_type, clean_sessions.action, margins=True).unstack()
    grouped_unknown = grouped_missing_analysis.loc(axis=0)[slice(None), ['Missing', 'Unknown', 'Other']]
    print(grouped_unknown)

导致打印:

action                        action_type
10                            Missing              0
                              Unknown              0
11                            Missing              0
                              Unknown              0
12                            Missing              0
                              Unknown              0
15                            Missing              0
                              Unknown              0
about_us                      Missing              0
                              Unknown            416
accept_decline                Missing              0
                              Unknown              0
account                       Missing              0
                              Unknown           9040
acculynk_bin_check_failed     Missing              0
                              Unknown              1
acculynk_bin_check_success    Missing              0
                              Unknown             51
acculynk_load_pin_pad         Missing              0
                              Unknown             50

我现在如何将每个操作的总MissingUnknownOther汇总为每个操作的总值计数,并以All的百分比形式显示action_types是MissingUnknown还是Other?例如,每个操作都会有一行,about_us行对所有操作都会406+0/Total Missing + Unknown + Other

有关上下文,请参阅this question

问题是上面的一行右边有一行名为All,它是所有内容的总和,所以:

All                           Missing        1126204
                              Unknown        1031170

所需的输出将是:

action                        percent_total_missing_action_type
10                            0
11                            0
12                            0
15                            0
about_us                     416/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
accept_decline                0
account                       9040/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
acculynk_bin_check_failed     1/total_missing_action_type (in the All row - 2157374, or the sum of everything in the action_type column)
etc..

以下是一些测试数据:

action                        action_type
    a                            Missing              2
                                 Unknown              5
    b                            Missing              3
                                 Unknown              4
    c                            Missing              5
                                 Unknown              6
    d                            Missing              1
                                 Unknown              9
    All                          Missing             11
                                 Unknown             24

应该进入这个:

     action                        action_type_percentage
    a                            Missing              2/11
                                 Unknown              5/24
    b                            Missing              3/11
                                 Unknown              4/24
    c                            Missing              5/11
                                 Unknown              6/24
    d                            Missing              1/11
                                 Unknown              9/24
    All                          Missing             11/11
                                 Unknown             24/24

1 个答案:

答案 0 :(得分:1)

首先,xs可以找到Multindex的{​​{1}}值,然后您可以按原All进行尝试。最后你可以reset_index

Series
print df
action  action_type
a       Missing         2
        Unknown         5
b       Missing         3
        Unknown         4
c       Missing         5
        Unknown         6
d       Missing         1
        Unknown         9
All     Missing        11
        Unknown        24
dtype: int64

print df.xs('All')
Missing    11
Unknown    24
dtype: int64
action  action_type

print df / df.xs('All')
action  action_type
a       Missing        0.181818
        Unknown        0.208333
b       Missing        0.272727
        Unknown        0.166667
c       Missing        0.454545
        Unknown        0.250000
d       Missing        0.090909
        Unknown        0.375000
All     Missing        1.000000
        Unknown        1.000000
dtype: float64