分组和计数以获得大熊猫的比率

时间:2017-09-07 05:25:31

标签: python pandas dataframe group-by

这是great question中询问的可以从解决方案中获益的数据框上的另一个。这是问题所在。

  

我想按country计算status的次数open   status以及closedcloserate的次数。然后   每country计算 customer country closeday status 1 1 BE 2017-08-23 closed 2 2 NL 2017-08-05 open 3 3 NL 2017-08-22 closed 4 4 NL 2017-08-26 closed 5 5 BE 2017-08-25 closed 6 6 NL 2017-08-13 open 7 7 BE 2017-08-30 closed 8 8 BE 2017-08-05 open 9 9 NL 2017-08-23 closed

     

数据:

open
     

我们的想法是获得一个描述closedclose_ratio数量的输出   country closed open closed_ratio BE 3 1 0.75 NL 3 2 0.60 状态和df customer country closeday status 1 1 BE 2017-08-23 closed 2 2 NL 2017-08-05 open 3 3 NL 2017-08-22 closed 4 4 NL 2017-08-26 closed 5 5 BE 2017-08-25 closed 6 6 NL 2017-08-13 open 7 7 BE 2017-08-30 closed 8 8 BE 2017-08-05 open 9 9 NL 2017-08-23 closed 。这是所需的输出:

groupby
     

期待您的建议。

答案中包含以下解决方案。欢迎其他解决方案。

2 个答案:

答案 0 :(得分:3)

这里有一些方法

1)

In [420]: (df.groupby(['country', 'status']).size().unstack()
             .assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

2)

In [422]: (pd.crosstab(df.country, df.status)
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

3)

In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

4)借用piRSquared

In [430]: (df.set_index('country').status.str.get_dummies().sum(level=0)
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[430]:
         closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

答案 1 :(得分:1)

size

应用unstack,并使用df2 = df.groupby(['country', 'status']).status.size().unstack(level=1) df2 status closed open country BE 3 1 NL 3 2 计算每个组,然后将closed_ratio计为第一级。

df2['closed_ratio'] = df2.closed / df2.sum(1)     
df2

status   closed  open  closed_ratio
country                            
BE            3     1          0.75
NL            3     2          0.60

现在,计算{{1}}:

{{1}}