Pandas Groupby:是否有标准化功能?或找到组中总数比例的最佳方法

时间:2018-09-07 06:12:10

标签: python pandas

数据:

{"Survived":{"0":0,"1":1,"2":1,"3":1,"4":0,"5":0,"6":0,"7":0,"8":1,"9":1,"10":1,"11":1,"12":0,"13":0,"14":0,"15":1,"16":0,"17":1,"18":0,"19":1,"20":0,"21":1,"22":1,"23":1,"24":0,"25":1,"26":0,"27":0,"28":1,"29":0,"30":0,"31":1,"32":1,"33":0,"34":0,"35":0,"36":1,"37":0,"38":0,"39":1,"40":0,"41":0,"42":0,"43":1,"44":1,"45":0,"46":0,"47":1,"48":0,"49":0},"Pclass":{"0":3,"1":1,"2":3,"3":1,"4":3,"5":3,"6":1,"7":3,"8":3,"9":2,"10":3,"11":1,"12":3,"13":3,"14":3,"15":2,"16":3,"17":2,"18":3,"19":3,"20":2,"21":2,"22":3,"23":1,"24":3,"25":3,"26":3,"27":1,"28":3,"29":3,"30":1,"31":1,"32":3,"33":2,"34":1,"35":1,"36":3,"37":3,"38":3,"39":3,"40":3,"41":2,"42":3,"43":2,"44":3,"45":3,"46":3,"47":3,"48":3,"49":3},"Sex":{"0":"male","1":"female","2":"female","3":"female","4":"male","5":"male","6":"male","7":"male","8":"female","9":"female","10":"female","11":"female","12":"male","13":"male","14":"female","15":"female","16":"male","17":"male","18":"female","19":"female","20":"male","21":"male","22":"female","23":"male","24":"female","25":"female","26":"male","27":"male","28":"female","29":"male","30":"male","31":"female","32":"female","33":"male","34":"male","35":"male","36":"male","37":"male","38":"female","39":"female","40":"female","41":"female","42":"male","43":"female","44":"female","45":"male","46":"male","47":"female","48":"male","49":"female"},"Age":{"0":22.0,"1":38.0,"2":26.0,"3":35.0,"4":35.0,"5":28.0,"6":54.0,"7":2.0,"8":27.0,"9":14.0,"10":4.0,"11":58.0,"12":20.0,"13":39.0,"14":14.0,"15":55.0,"16":2.0,"17":28.0,"18":31.0,"19":28.0,"20":35.0,"21":34.0,"22":15.0,"23":28.0,"24":8.0,"25":38.0,"26":28.0,"27":19.0,"28":28.0,"29":28.0,"30":40.0,"31":28.0,"32":28.0,"33":66.0,"34":28.0,"35":42.0,"36":28.0,"37":21.0,"38":18.0,"39":14.0,"40":40.0,"41":27.0,"42":28.0,"43":3.0,"44":19.0,"45":28.0,"46":28.0,"47":28.0,"48":28.0,"49":18.0},"SibSp":{"0":1,"1":1,"2":0,"3":1,"4":0,"5":0,"6":0,"7":3,"8":0,"9":1,"10":1,"11":0,"12":0,"13":1,"14":0,"15":0,"16":4,"17":0,"18":1,"19":0,"20":0,"21":0,"22":0,"23":0,"24":3,"25":1,"26":0,"27":3,"28":0,"29":0,"30":0,"31":1,"32":0,"33":0,"34":1,"35":1,"36":0,"37":0,"38":2,"39":1,"40":1,"41":1,"42":0,"43":1,"44":0,"45":0,"46":1,"47":0,"48":2,"49":1},"Parch":{"0":0,"1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":1,"8":2,"9":0,"10":1,"11":0,"12":0,"13":5,"14":0,"15":0,"16":1,"17":0,"18":0,"19":0,"20":0,"21":0,"22":0,"23":0,"24":1,"25":5,"26":0,"27":2,"28":0,"29":0,"30":0,"31":0,"32":0,"33":0,"34":0,"35":0,"36":0,"37":0,"38":0,"39":0,"40":0,"41":0,"42":0,"43":2,"44":0,"45":0,"46":0,"47":0,"48":0,"49":0},"Fare":{"0":7.25,"1":71.2833,"2":7.925,"3":53.1,"4":8.05,"5":8.4583,"6":51.8625,"7":21.075,"8":11.1333,"9":30.0708,"10":16.7,"11":26.55,"12":8.05,"13":31.275,"14":7.8542,"15":16.0,"16":29.125,"17":13.0,"18":18.0,"19":7.225,"20":26.0,"21":13.0,"22":8.0292,"23":35.5,"24":21.075,"25":31.3875,"26":7.225,"27":263.0,"28":7.8792,"29":7.8958,"30":27.7208,"31":146.5208,"32":7.75,"33":10.5,"34":82.1708,"35":52.0,"36":7.2292,"37":8.05,"38":18.0,"39":11.2417,"40":9.475,"41":21.0,"42":7.8958,"43":41.5792,"44":7.8792,"45":8.05,"46":15.5,"47":7.75,"48":21.6792,"49":17.8},"Embarked":{"0":"S","1":"C","2":"S","3":"S","4":"S","5":"Q","6":"S","7":"S","8":"S","9":"C","10":"S","11":"S","12":"S","13":"S","14":"S","15":"S","16":"Q","17":"S","18":"S","19":"C","20":"S","21":"S","22":"Q","23":"S","24":"S","25":"S","26":"C","27":"S","28":"Q","29":"S","30":"C","31":"C","32":"Q","33":"S","34":"C","35":"S","36":"C","37":"S","38":"S","39":"C","40":"S","41":"S","42":"C","43":"C","44":"Q","45":"S","46":"Q","47":"Q","48":"C","49":"S"},"Sex_Code":{"0":1,"1":0,"2":0,"3":0,"4":1,"5":1,"6":1,"7":1,"8":0,"9":0,"10":0,"11":0,"12":1,"13":1,"14":0,"15":0,"16":1,"17":1,"18":0,"19":0,"20":1,"21":1,"22":0,"23":1,"24":0,"25":0,"26":1,"27":1,"28":0,"29":1,"30":1,"31":0,"32":0,"33":1,"34":1,"35":1,"36":1,"37":1,"38":0,"39":0,"40":0,"41":0,"42":1,"43":0,"44":0,"45":1,"46":1,"47":0,"48":1,"49":0},"Embarked_Code":{"0":2,"1":0,"2":2,"3":2,"4":2,"5":1,"6":2,"7":2,"8":2,"9":0,"10":2,"11":2,"12":2,"13":2,"14":2,"15":2,"16":1,"17":2,"18":2,"19":0,"20":2,"21":2,"22":1,"23":2,"24":2,"25":2,"26":0,"27":2,"28":1,"29":2,"30":0,"31":0,"32":1,"33":2,"34":0,"35":2,"36":0,"37":2,"38":2,"39":0,"40":2,"41":2,"42":0,"43":0,"44":1,"45":2,"46":1,"47":1,"48":0,"49":2}}

我正在使用Kaggle的Titanic数据集。我试图找出每个Pclass男女中存活的百分比。

分组示例:

train_df.groupby(['Pclass','Sex','Survived']).apply(lambda x: len(x)).unstack(2).plot(kind='bar')

这向我展示了每个班级中有多少男人和女人幸存下来,有多少人没有幸存下来,但从视觉上看,每个班级中有多少男人和女人得以幸存。

所需结果:

train_df.groupby(['Pclass','Sex','Survived']).apply(lambda x: len(x)).unstack(2)[1]/(train_df.groupby(['Pclass','Sex','Survived']).apply(lambda x: len(x)).unstack(2)[1]+train_df.groupby(['Pclass','Sex','Survived']).apply(lambda x: len(x)).unstack(2)[0])

这看起来好像获得了预期的结果,但是我想知道是否还有更多的Python方式可以做到这一点?就像normalize = True选项一样。

最终目标:

每个Pclass中每种性别幸存的比率的条形图

3 个答案:

答案 0 :(得分:2)

您在lenunstack的轨道上。但是,您可以使用count来简化此操作:

v = df.groupby(['Pclass','Sex','Survived']).Sex.count().unstack(2)

现在,找到平均值。

v[1] / v.sum(1)

Pclass  Sex   
1       female    1.000000
        male      0.166667
2       female    0.750000
        male      0.500000
3       female    0.647059
        male      0.066667
dtype: float64

(v[1] / v.sum(1)).plot(kind='bar')作图。

答案 1 :(得分:2)

groupby

df.groupby(['Pclass', 'Sex']).Survived.mean().unstack()

Sex       female      male
Pclass                    
1       1.000000  0.166667
2       0.750000  0.500000
3       0.647059  0.066667

pivot_table

df.pivot_table('Survived', 'Pclass', 'Sex')

Sex       female      male
Pclass                    
1       1.000000  0.166667
2       0.750000  0.500000
3       0.647059  0.066667

答案 2 :(得分:2)

您可以将Groupby sumsize用作:

g = df.groupby(['Pclass','Sex'])['Survived']
g.sum()/g.size()

Pclass  Sex   
1       female    1.000000
        male      0.166667
2       female    0.750000
        male      0.500000
3       female    0.647059
        male      0.066667
Name: Survived, dtype: float64

(g.sum()/g.size()).unstack()

Sex female  male
Pclass      
1   1.000000    0.166667
2   0.750000    0.500000
3   0.647059    0.066667