在Pandas中生成Stacked Barchart以及checkbutton

时间:2016-11-02 19:23:35

标签: python pandas matplotlib

我有一个近20k行的数据集。该数据集的属性是:        1)出生年份;        2)性;和
       3)区域名称; 现在我想创建一个如下所示的堆叠条形图: enter image description here

这是数据帧的一瞥: enter image description here

如何使用熊猫实现这一目标?还是考虑到数据集大小还有其他更好的方法吗?

数据框:

"Year","SexCd","Sex","StatZoneSort","StatZoneLang","AreaCode","AreaName","Number Of Births"
2015,2,"W",1101,"Zähringerstrasse",11,"Rathaus",2
2015,1,"M",1101,"Zähringerstrasse",11,"Rathaus",2
2015,2,"W",1102,"Zentralbibliothek",11,"Rathaus",1
2015,1,"M",1102,"Zentralbibliothek",11,"Rathaus",3
2015,1,"M",1103,"Grossmünster",11,"Rathaus",6
2015,2,"W",1103,"Grossmünster",11,"Rathaus",4
2015,1,"M",1104,"Oberdorf",11,"Rathaus",2
2015,2,"W",1104,"Oberdorf",11,"Rathaus",1
2015,1,"M",1201,"Central",12,"Hochschulen",3
2015,2,"W",1201,"Central",12,"Hochschulen",1
2015,1,"M",1301,"Schipfe",13,"Lindenhof",4
2015,2,"W",1301,"Schipfe",13,"Lindenhof",1
2015,1,"M",1403,"Selnaustrasse",14,"City",4
2015,2,"W",1403,"Selnaustrasse",14,"City",1
2015,1,"M",2101,"Bahnhof Wollishofen",21,"Wollishofen",3
2015,2,"W",2101,"Bahnhof Wollishofen",21,"Wollishofen",6
2015,1,"M",2102,"Bellariastrasse",21,"Wollishofen",31
2015,2,"W",2102,"Bellariastrasse",21,"Wollishofen",19
2015,1,"M",2103,"Jugendherberge",21,"Wollishofen",7
2015,2,"W",2103,"Jugendherberge",21,"Wollishofen",6
2015,1,"M",2104,"Morgental",21,"Wollishofen",13
2015,2,"W",2104,"Morgental",21,"Wollishofen",12
2015,1,"M",2106,"Waschanstalt",21,"Wollishofen",3
2015,2,"W",2107,"Auf der Egg",21,"Wollishofen",10
2015,1,"M",2107,"Auf der Egg",21,"Wollishofen",8
2015,1,"M",2108,"Neubühl",21,"Wollishofen",14
2015,2,"W",2108,"Neubühl",21,"Wollishofen",22
2015,2,"W",2109,"Entlisberg",21,"Wollishofen",12
2015,1,"M",2109,"Entlisberg",21,"Wollishofen",17
2015,1,"M",2110,"Verenastrasse",21,"Wollishofen",9
2015,2,"W",2110,"Verenastrasse",21,"Wollishofen",8
2015,1,"M",2111,"Seeblickstrasse",21,"Wollishofen",4
2015,2,"W",2111,"Seeblickstrasse",21,"Wollishofen",4
2015,1,"M",2301,"Höckler",23,"Leimbach",15
2015,2,"W",2301,"Höckler",23,"Leimbach",10
2015,2,"W",2302,"Mahrbachweg",23,"Leimbach",16
2015,1,"M",2302,"Mahrbachweg",23,"Leimbach",10
2015,1,"M",2303,"Sihlweidstrasse",23,"Leimbach",18
2015,2,"W",2303,"Sihlweidstrasse",23,"Leimbach",21
2015,1,"M",2401,"Parkring",24,"Enge",10
2015,2,"W",2401,"Parkring",24,"Enge",7
2015,1,"M",2402,"Kongresshaus",24,"Enge",8
2015,2,"W",2402,"Kongresshaus",24,"Enge",4
2015,2,"W",2403,"Belvoir-Park",24,"Enge",10
2015,1,"M",2403,"Belvoir-Park",24,"Enge",15
2015,2,"W",2404,"Museum Rietberg",24,"Enge",29
2015,1,"M",2404,"Museum Rietberg",24,"Enge",22
2015,2,"W",2405,"Rieterplatz",24,"Enge",20
2015,1,"M",2405,"Rieterplatz",24,"Enge",26
2015,2,"W",2406,"Gartenstrasse",24,"Enge",2
2015,2,"W",3101,"Höfliweg",31,"Alt-Wiedikon",31
2015,1,"M",3101,"Höfliweg",31,"Alt-Wiedikon",33
2015,1,"M",3102,"Goldbrunnenplatz",31,"Alt-Wiedikon",20
2015,2,"W",3102,"Goldbrunnenplatz",31,"Alt-Wiedikon",17
2015,1,"M",3103,"Gotthelfstrasse",31,"Alt-Wiedikon",14
2015,2,"W",3103,"Gotthelfstrasse",31,"Alt-Wiedikon",12
2015,1,"M",3104,"Manesseplatz",31,"Alt-Wiedikon",22
2015,2,"W",3104,"Manesseplatz",31,"Alt-Wiedikon",27
2015,2,"W",3105,"Binz",31,"Alt-Wiedikon",17
2015,1,"M",3105,"Binz",31,"Alt-Wiedikon",21
2015,2,"W",3106,"Saalsporthalle",31,"Alt-Wiedikon",25
2015,1,"M",3106,"Saalsporthalle",31,"Alt-Wiedikon",31
2015,1,"M",3301,"Heuried",33,"Friesenberg",6
2015,2,"W",3301,"Heuried",33,"Friesenberg",13
2015,1,"M",3302,"Gehrenholz",33,"Friesenberg",8
2015,2,"W",3302,"Gehrenholz",33,"Friesenberg",5
2015,1,"M",3303,"Uetliberg",33,"Friesenberg",10
2015,2,"W",3303,"Uetliberg",33,"Friesenberg",6
2015,2,"W",3304,"Strassenverkehrsamt",33,"Friesenberg",8
2015,1,"M",3304,"Strassenverkehrsamt",33,"Friesenberg",9
2015,2,"W",3305,"Albisgüetli",33,"Friesenberg",12
2015,1,"M",3305,"Albisgüetli",33,"Friesenberg",8
2015,1,"M",3306,"Triemli",33,"Friesenberg",5
2015,2,"W",3306,"Triemli",33,"Friesenberg",9
2015,2,"W",3401,"Schaufelbergerstrasse",34,"Sihlfeld",22
2015,1,"M",3401,"Schaufelbergerstrasse",34,"Sihlfeld",23
2015,1,"M",3402,"Friedhof Sihlfeld",34,"Sihlfeld",8
2015,2,"W",3402,"Friedhof Sihlfeld",34,"Sihlfeld",8
2015,1,"M",3403,"Brahmsstrasse",34,"Sihlfeld",18
2015,2,"W",3403,"Brahmsstrasse",34,"Sihlfeld",12
2015,2,"W",3404,"Fritschistrasse",34,"Sihlfeld",19
2015,1,"M",3404,"Fritschistrasse",34,"Sihlfeld",8
2015,1,"M",3405,"Idaplatz",34,"Sihlfeld",23
2015,2,"W",3405,"Idaplatz",34,"Sihlfeld",25
2015,1,"M",3406,"Zwinglihaus",34,"Sihlfeld",23
2015,2,"W",3406,"Zwinglihaus",34,"Sihlfeld",24
2015,2,"W",3407,"Bahnhof Wiedikon",34,"Sihlfeld",23
2015,1,"M",3407,"Bahnhof Wiedikon",34,"Sihlfeld",24
2015,1,"M",3408,"Sihlhölzli",34,"Sihlfeld",21
2015,2,"W",3408,"Sihlhölzli",34,"Sihlfeld",21
2015,1,"M",4101,"Kalkbreite",41,"Werd",11
2015,2,"W",4101,"Kalkbreite",41,"Werd",19

1 个答案:

答案 0 :(得分:1)

在绘制数据之前,需要采取一些步骤来获取正确格式的数据。首先,需要根据AreaCodeSex汇总数据(如果需要,可能按年汇总)。我们可以在数据框上调用.groupby后跟.sum来聚合数据。

然而,它仍然不会是我们想要的形式。要制作堆积条形图,每个堆栈都是一列。因此,我们需要malefemale列。换句话说,我们需要转动数据以拆分sex列。

示例代码:

# read in the data
df = pd.read_csv('text.txt')

# aggregate by the columns of interest
agg_df = df.groupby(['AreaCode','Sex']).sum()

# move "AreaCode" and "Sex" out of the index and back to columns
agg_df.reset_index(inplace=True)

# pivot the data, setting the AreaCode as the row indices, splitting
# 'Sex' into 'M' and 'W' columns, and using num.births as the values
piv_df = agg_df.pivot(index='AreaCode', columns='Sex', values='Number Of Births')

# plot as a stacked bar
piv_df.plot.bar(stacked=True)

enter image description here