制作分组条形图Matplotlib(比较两个变量)

时间:2017-09-21 15:46:16

标签: python pandas matplotlib

这是一个非常基本的问题,我陷入了困境。

我正在尝试制作这样的图表:

enter image description here

我有以下DataFrame:

duration    start_date  start_year  start_month start_hour  weekday start_city  end_city    subscription_type
0   1.050000    2013-08-29  2013    8   14  3   San Francisco   San Francisco   Subscriber
1   1.166667    2013-08-29  2013    8   14  3   San Jose    San Jose    Subscriber
2   1.183333    2013-08-29  2013    8   10  3   Mountain View   Mountain View   Subscriber
3   1.283333    2013-08-29  2013    8   11  3   San Jose    San Jose    Subscriber
4   1.383333    2013-08-29  2013    8   12  3   San Francisco   San Francisco   Subscriber

我正在使用此代码:

    viagem_por_tipo = trip_data.groupby(['subscription_type'],['weekday'])['start_year'].count()
viagem_por_tipo.plot.bar()

收到此错误:

    TypeError                                 Traceback (most recent call last)
<ipython-input-40-4b3318e38ba7> in <module>()
----> 1 viagem_por_tipo = trip_data.groupby(['subscription_type'],['weekday'])['start_year'].count()
      2 viagem_por_tipo.plot.bar()

C:\Users\Michel Spiero\Anaconda3\lib\site-packages\pandas\core\generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   4266         if level is None and by is None:
   4267             raise TypeError("You have to supply one of 'by' and 'level'")
-> 4268         axis = self._get_axis_number(axis)
   4269         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   4270                        sort=sort, group_keys=group_keys, squeeze=squeeze,

C:\Users\Michel Spiero\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_axis_number(self, axis)
    339 
    340     def _get_axis_number(self, axis):
--> 341         axis = self._AXIS_ALIASES.get(axis, axis)
    342         if is_integer(axis):
    343             if axis in self._AXIS_NAMES:

TypeError: unhashable type: 'list'

有人可以帮我吗?

提前致谢。

1 个答案:

答案 0 :(得分:1)

请注意,groupby()需要一个列列表df.groupby(['subscription_type','weekday'])

然后,您需要转动分组的数据框。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"subscription_type" : np.random.choice(["Subscriber","Customer"], size=240),
                   "weekday" : np.random.randint(1,8,size=240),
                   "start_year" : np.ones(240)*2013 })

gr =df.groupby(['subscription_type','weekday'])['start_year'].size().reset_index(name="Count")

piv = pd.pivot_table(gr,  values='Count',  columns=['subscription_type'],  
                         index = "weekday", aggfunc=np.sum,  fill_value=0)

piv.plot(kind="bar")

plt.show()

enter image description here

也许值得注意的是,使用 seaborn countplot可以获得类似的结果:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame({"subscription_type" : np.random.choice(["Subscriber","Customer"], size=240),
                   "weekday" : np.random.randint(1,8,size=240),
                   "start_year" : np.ones(240)*2013 })

sns.countplot(x="weekday", hue="subscription_type", data=df)

plt.show()

enter image description here