Question

如何使用group by绘制带有pandas DataFrame.hist（）的直方图？我有一个包含5列的数据框：＆＃34; A＆＃34;，＆＃34; B＆＃34;，＆＃34; C＆＃34;，＆＃34; D＆＃34;和＆＃34; Group＆＃34;

有两个小组课程：＆＃34;是＆＃34;和＆＃34;不＆＃34;

使用：

df.hist()

我得到4列中每一列的组织。

现在我想得到相同的4个图表，但是有蓝色条形图（group =＆＃34; yes＆＃34;）和红色条形图（group =＆＃34; no＆＃34;）。

我没有成功地试过这个：

df.hist(by = "group")

Answer 1

使用Seaborn

如果您愿意使用Seaborn，可以使用seaborn.FacetGrid轻松制作包含多个子图和每个子图中的多个变量的图。

import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)

df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')

bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")

g.axes[-1].legend()
plt.show()

Answer 2

这不是最灵活的解决方法，但会专门针对您的问题。

def sephist(col):
    yes = df[df['group'] == 'yes'][col]
    no = df[df['group'] == 'no'][col]
    return yes, no

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
    plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

您可以通过以下方式使其更通用：

将df和by参数添加到sephist：def sephist(df, by, col)
使子图循环更灵活：for num, alpha in enumerate(df.columns)

因为matplotlib.pyplot.hist的第一个参数可以采用

单个数组或不需要的数组序列长度相同

......替代方案是：

for num, alpha in enumerate('abcd'):
    plt.subplot(2, 2, num)
    plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
    plt.legend(loc='upper right')
    plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)

Answer 3

我概括了其他评论的解决方案之一。希望它可以帮助那里的人。我添加了一行以确保为每列保留分箱（数字和范围），而不管组如何。该代码应该适用于“二进制”和“分类”分组，即“by”可以指定一个列，其中有 N 个唯一组。如果要绘制的列数超过子图空间，也会停止绘制。

import numpy as np
import matplotlib.pyplot as plt

def composite_histplot(df, columns, by, nbins=25, alpha=0.5):
    def _sephist(df, col, by):
        unique_vals = df[by].unique()
        df_by = dict()
        for uv in unique_vals:
            df_by[uv] = df[df[by] == uv][col]
        return df_by
    subplt_c = 4
    subplt_r = 5
    fig = plt.figure()
    for num, col in enumerate(columns):
        if num + 1 > subplt_c * subplt_r:
            continue
        plt.subplot(subplt_c, subplt_r, num+1)
        bins = np.linspace(df[col].min(), df[col].max(), nbins)
        for lbl, sepcol in _sephist(df, col, by).items():
            plt.hist(sepcol, bins=bins, alpha=alpha, label=lbl)
            plt.legend(loc='upper right', title=by)
            plt.title(col)
    plt.tight_layout()
    
    return fig

熊猫直方图df.hist（）分组

3 个答案:

使用Seaborn