数据透视表中的条形图,具有总计和每组总计的百分比

时间:2019-04-28 18:49:50

标签: python pandas matplotlib

这是挑战:用shipwreck.csv文件创建一个数据框。 根据此数据框,构建一个数据透视表,该数据透视表显示每个班级中男性/女性的平均票价,以及每个班级中尚存的男性/女性人数。行索引应为类值。使用边际来包括每个班级中所有男性,女性和所有乘客的平均值。打印整个框架,然后创建一个条形图,以每个级别为基础显示男性和女性以及所有乘客的生存率。在上一个问题中使用数据透视表中的数据。条的宽度应为.25。

我的问题是我只用那些指定的列构建了数据框,但我不明白如何获取数据框数据透视表以及如何查找男性/女性的平均票价才能设置图表。

到目前为止,这是我的代码:

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
matplotlib.rcParams['figure.figsize'] = (10.0, 4.0)

df =  pd.read_csv("shipwreck.csv",usecols=     
['survived','sex','fare','class'])
df.set_index('survived')
print(df)
#pivot table to get average fares for male/female then plot it
#use bar graph w/ width of.25 for bars

这是.csv在数据框中显示的内容:

             survived     sex      fare   class
        0           0    male    7.2500   Third
        1           1  female   71.2833   First
        2           1  female    7.9250   Third
        3           1  female   53.1000   First
        4           0    male    8.0500   Third
        5           0    male    8.4583   Third
        6           0    male   51.8625   First
        7           0    male   21.0750   Third
        8           1  female   11.1333   Third
        9           1  female   30.0708  Second
        10          1  female   16.7000   Third
        11          1  female   26.5500   First
        12          0    male    8.0500   Third
        13          0    male   31.2750   Third
        14          0  female    7.8542   Third
        15          1  female   16.0000  Second
        16          0    male   29.1250   Third
        17          1    male   13.0000  Second
        18          0  female   18.0000   Third
        19          1  female    7.2250   Third
        20          0    male   26.0000  Second
        21          1    male   13.0000  Second
        22          1  female    8.0292   Third
        23          1    male   35.5000   First
        24          0  female   21.0750   Third
        25          1  female   31.3875   Third
        26          0    male    7.2250   Third
        27          0    male  263.0000   First
        28          1  female    7.8792   Third
        29          0    male    7.8958   Third
        ..        ...     ...       ...     ...
        861         0    male   11.5000  Second
        862         1  female   25.9292   First
        863         0  female   69.5500   Third
        864         0    male   13.0000  Second
        865         1  female   13.0000  Second
        866         1  female   13.8583  Second
        867         0    male   50.4958   First
        868         0    male    9.5000   Third
        869         1    male   11.1333   Third
        870         0    male    7.8958   Third
        871         1  female   52.5542   First
        872         0    male    5.0000   First
        873         0    male    9.0000   Third
        874         1  female   24.0000  Second
        875         1  female    7.2250   Third
        876         0    male    9.8458   Third
        877         0    male    7.8958   Third
        878         0    male    7.8958   Third
        879         1  female   83.1583   First
        880         1  female   26.0000  Second
        881         0    male    7.8958   Third
        882         0  female   10.5167   Third
        883         0    male   10.5000  Second
        884         0    male    7.0500   Third
        885         0  female   29.1250   Third
        886         0    male   13.0000  Second
        887         1  female   30.0000   First
        888         0  female   23.4500   Third
        889         1    male   30.0000   First
        890         0    male    7.7500   Third

        [891 rows x 4 columns]

条形图应如下所示:

enter image description here

1 个答案:

答案 0 :(得分:3)

这是您可以做的:

df = pd.read_csv('shipwreck.csv', usecols=['survived', 'sex', 'class'])
df_piv = pd.pivot_table(df,
                        index='class',
                        columns='sex',
                        aggfunc=lambda x: 100*x.sum()/x.count(), # % per group
                        margins=True,
                        margins_name='Combined')
df_piv.columns = df_piv.columns.droplevel()
df_piv.plot.bar(rot='horizontal');

enter image description here