Python / Pandas - 将3个数据集合并为一个列图

时间:2018-03-12 05:16:31

标签: python pandas

我现在正在进行基本的数据分析,而且我在努力尝试在3个数据集时创建列图。

这是我的数据:

datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}

datasetColumbia = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['1500 ','1600', '1500' ,'1600' ,'1500', '1200' ,'1300', '1400' ,'1400', '1500' ,'1500' ,'1500' ,'1600' ,'1500', '1500', '1400', '1400']}

datasetBrazil = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}

有什么建议可以将其转换为一个大的柱形图,并让不同颜色的国家?

这是我将数据集合在一起并将其打印出来的不良尝试。

df4 = pd.DataFrame.from_dict(datasetArgentina)
df5 = pd.DataFrame.from_dict(datasetColumbia)
df6 = pd.DataFrame.from_dict(datasetBrazil)

df7 = pd.merge(df4, df5, on='Year')
df8 = pd.merge(df6, df7, on='Year', how='left')
print(df7)
print(df8)

plt.bar(df8['Year'], df8['Mortality'])
plt.title('South America')
plt.xticks(df8['Year'], rotation=90)
plt.xlabel('Year')
plt.ylabel('Mortality')
plt.tight_layout()
plt.show()

任何帮助都会很棒。

输出:

df7   Mortality_x  Year Mortality_y
0        11000  2000       1500 
1        10000  2001        1600
2        10000  2002        1500
3        10000  2003        1600
4        10000  2004        1500
5         9300  2005        1200
6         8900  2006        1300
7         8700  2007        1400
8         9000  2008        1400
9         8600  2009        1500
10        8300  2010        1500
11        8100  2011        1500
12        7800  2012        1600
13        8000  2013        1500
14        7500  2014        1500
15        7500  2015        1400
16        7300  2016        1400
df8   Mortality  Year Mortality_x Mortality_y
0      11000  2000       11000       1500 
1      10000  2001       10000        1600
2      10000  2002       10000        1500
3      10000  2003       10000        1600
4      10000  2004       10000        1500
5       9300  2005        9300        1200
6       8900  2006        8900        1300
7       8700  2007        8700        1400
8       9000  2008        9000        1400
9       8600  2009        8600        1500
10      8300  2010        8300        1500
11      8100  2011        8100        1500
12      7800  2012        7800        1600
13      8000  2013        8000        1500
14      7500  2014        7500        1500
15      7500  2015        7500        1400
16      7300  2016        7300        1400

Arg Bra

2 个答案:

答案 0 :(得分:2)

使用concat连接您的数据框,然后使用groupby + plot按国家/地区对其进行分组和绘图:

df = pd.concat(
       [df4, df5, df6], keys=['Argentina', 'Columbia', 'Brazil']
)

df.astype(int).groupby(level=0).plot.bar(x='Year', y='Mortality');
plt.show()

这为您提供了每组的单独图表。

答案 1 :(得分:1)

您可以seaborn使用factorplot,如下所示:

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}

datasetColumbia = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['1500 ','1600', '1500' ,'1600' ,'1500', '1200' ,'1300', '1400' ,'1400', '1500' ,'1500' ,'1500' ,'1600' ,'1500', '1500', '1400', '1400']}

datasetBrazil = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}


df4 = pd.DataFrame(datasetArgentina)
df5 = pd.DataFrame(datasetColumbia)
df6 = pd.DataFrame(datasetBrazil)

附加代码:

# add country field for each dataframe
df4['country'] = 'Argentina'    
df5['country'] = 'Columbia'
df6['country'] = 'Brazil'

# Combine all dataframes
df = pd.concat([df4,df5,df6])
# convert to float
df['Mortality'] = df['Mortality'].astype(float)

sns.factorplot(data=df, hue='country', x='Year', y='Mortality', kind='bar', ci=None, aspect=3, size=7);
plt.xticks(rotation=45);

结果(了解更多信息,您可以查看seabornfactorplot ):

result image