Question

我最难解决这个问题。我有一个包含多个分类字段的数据框，我希望将它们全部绘制为直方图，并在每个直方图上叠加目标变量（收入）。我原本希望能够使用Pandas进行直方图并迭代所有字段，但是当我尝试绘制Race和overlay Income时，传说中没有，我似乎无法将收入叠加到堆栈中在彼此身上。

下面是一个类似于我的示例数据框，以及我尝试过的最新内容..

exampledf = {'Race': ['Black', 'White', 'Asian', 'White', 
                  'White', 'Asian', 'White', 'White', 
                  'White', 'Black', 'White', 'Asian'],
        'Income': ['>=50k', '>=50k', '>=50k', '>=50k',
                   '>=50k', '<50k', '<50k', '>=50k',
                   '>=50k', '>=50k', '<50k', '>=50k',],
        'Gender': ['M', 'F', 'F', 'F',
                   'M', 'M', 'M', 'M',
                   'M', 'M', 'M', 'M']}
exampledf =pd.DataFrame(exampledf)
exampledf.groupby(['Income','Race']).size().plot(x=exampledf['Race'], kind='bar', color=['r','b'], logy=False, legend=True)

Answer 1

您拨打plot的方式不正确。您没有使用pandas为条形图传递x变量。它将自动使用索引作为x轴。但是，因为你有一个多索引，它可能不会给你你想要的图表。

要创建种族与收入的条形图，您需要将种族作为索引（行），将收入作为列，将计数作为值。您不希望groupby，您想要转移数据。在这种情况下，您要使用.pivot_table。

这将创建一个新的数据框，其索引为race（pandas .plot的x值），不同的收入为列（.plot的y值）。

pt = exampledf[['Race','Income']].pivot_table(index='Race', columns='Income', 
                                              aggfunc=len, fill_value=0) 
# output of pt:
# Income  <50k  >=50k
# Race
# Asian      1      2
# Black      0      2
# White      2      5

# make the plot
pt.plot.bar()

这是使用IPython的图像。使用Jupyter Notebook的默认设置看起来更好。

Answer 2

詹姆斯使用纯大熊猫的答案很可能是你正在寻找的，但是由于其惊人的简洁性，我越来越多地转向altair来实现DataFrames的可视化。

只需将框架列分配给图表中的尺寸，即可获得所需的内容：

from altair import Chart

Chart(exampledf).mark_bar(
).encode(
    y='Race',
    x='count(*)',
    color='Income'
)

或：

Chart(exampledf).mark_bar(
).encode(
    column='Race',
    y='count(*)',
    x='Income'
)

来自Pandas DataFrame的叠加多个直方图

2 个答案: