我有一个包含50个数字变量和1个类别变量的数据集(segment_hc_print,具有6个类别)。我想通过绘制直方图网格来查看每个类别中每个变量的分布情况,其中每一行代表一个类别,一列代表变量,网格中的每个单元格都是一个直方图。我正在尝试下面的代码为单个变量生成网格:
def grid_histogram(variable, bins):
fig = plt.figure(figsize=(20,10))
fig.set_size_inches(10,10, forward = True)
fig.suptitle(variable, fontsize = 8)
plt.locator_params(numticks = 4)
for i in np.arange(0, 6, 1):
ax = plt.subplot(6,1,i+1)
ax.hist(sensor_df_print_sample_v2[sensor_df_print_sample_v2.segment_hc_print == i][variable], bins)
ax.set_title("cluster = " + str(i), fontsize = 5)
ymin, ymax = ax.get_ylim()
ax.set_yticks(np.round(np.linspace(ymin, ymax, 3), 2))
xmin, xmax = ax.get_xlim()
ax.set_xticks(np.round(np.linspace(xmin, xmax,3),2))
plt.setp(ax.get_xticklabels(), rotation = 'vertical', fontsize = 4)
fig.tight_layout()
fig.savefig(str(variable) + '_histogram.pdf')
plt.show()
这就是我得到的: sample histogram
如何生成此类直方图的网格,每个直方图堆叠在另一个变量的右侧? 下面的代码生成我需要的理想直方图大小。 sample histogram
答案 0 :(得分:1)
如果我理解正确,则可以使用plt.subplots()
创建一个网格。在下面的示例中,我将前5个变量绘制为列:
nr_of_categories = 6
nr_of_variables = 5
fig, ax = plt.subplots(nrows = nr_of_categories, cols = nr_of_variables, figsize = (20, 20))
for category in np.arange(0, nr_of_categories):
for variable in np.arange(0, nr_of_variables):
ax[category, variable].hist(sensor_df_print_sample_v2[sensor_df_print_sample_v2.segment_hc_print == i][variable], bins)
# and then the rest of your code where you replace ax with ax[category, variable]