Question

我们假设我有pandas数据框，其中有许多功能，我对两个感兴趣。我打电话给他们feature1和feature2。

feature1可以有三个可能的值。 feature2可以有两个可能的值。

我需要按feature1分组的条形图，并按行数计算，每个值为feature2。（这样会有三个堆叠，每个堆叠有两个柱子）。

如何实现这一目标？

目前我有

import pandas as pd
df = pd.read_csv('data.csv')
df['feature1'][df['feature2'] == 0].value_counts().plot(kind='bar',label='0')
df['feature1'][df['feature2'] == 1].value_counts().plot(kind='bar',label='1')

但这不是我真正想要的，因为它不会叠加它们。

Answer 1

另外，我找到了另一种方法（使用pandas）：

df.groupby(['feature1', 'feature2']).size().unstack().plot(kind='bar', stacked=True)

来源： making a stacked barchart in pandas

Answer 2

我不确定如何在matplotlib（pandas默认绘图库）中执行此操作，但如果您愿意尝试使用其他绘图库，则可以使用Bokeh轻松完成。

这是一个例子

import pandas as pd
from bokeh.charts import Bar, output_file, show
x = pd.DataFrame({"gender": ["m","f","m","f","m","f"],
                  "enrolments": [500,20,100,342,54,47],
                  "class": ["comp-sci", "comp-sci",
                            "psych", "psych",
                            "history", "history"]})

bar = Bar(x, values='enrolments', label='class', stack='gender',
         title="Number of students enrolled per class",
         legend='top_right',bar_width=1.0)
output_file("myPlot.html")
show(bar)

Answer 3

size会为该分组生成一个具有简单行计数的列，它会生成y轴的值。 unstack生成matplotlib创建堆叠条形图所需的行和列信息。

基本上需要

>>> s
one  a   1.0
     b   2.0
two  a   3.0
     b   4.0

并产生：

>>> s.unstack(level=-1)
     a   b
one  1.0  2.0
two  3.0  4.0

通过分组数据与大熊猫堆积条形图

3 个答案: