Question

我按标签分隔了以下数据：

CHROM   ms02g:PI    num_Vars_by_PI  range_of_PI total_haplotypes    total_Vars
1   1,2 60,6    2820,81 2   66
2   9,8,10,7,11 94,78,10,69,25  89910,1102167,600,1621365,636   5   276
3   5,3,4,6 6,12,14,17  908,394,759,115656  4   49
4   17,18,22,16,19,21,20    22,11,3,16,7,12,6   1463,171,149,256,157,388,195    7   77
5   13,15,12,14 56,25,96,107    2600821,858,5666,1792   4   284
7   24,26,29,25,27,23,30,28,31  12,31,19,6,12,23,9,37,25    968,3353,489,116,523,1933,823,2655,331  9   174
8   33,32   53,35   1603,2991338    2   88

我正在使用此代码为每个CHROM构建包含子图的直方图：

with open(outputdir + '/' + 'hap_size_byVar_'+ soi +'_'+ prefix+'.png', 'wb') as fig_initial:
    fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True)
    for i, data in hap_stats.iterrows():

        # first convert data to list of integers
        data_i = [int(x) for x in data['num_Vars_by_PI'].split(',')]
        ax[i].hist(data_i, label=str(data['CHROM']), alpha=0.5)
        ax[i].legend()

    plt.xlabel('size of the haplotype (number of variants)')
    plt.ylabel('frequency of the haplotypes')
    plt.suptitle('histogram of size of the haplotype (number of variants) \n'
                 'for each chromosome')
    plt.savefig(fig_initial)

一切都很好，除了两个问题：

在此输出图中未正确调整 Y标签frequency of the haplotypes 。

当数据只包含一行时（参见下面的数据），子图不可能，我得到 TypeError ，即使它应该能够使子组只有一个索引

仅包含一行数据的数据框：

 CHROM  ms02g:PI    num_Vars_by_PI  range_of_PI total_haplotypes    total_Vars
 2  9,8,10,7,11 94,78,10,69,25  89910,1102167,600,1621365,636   5   276

TypeError :

Traceback (most recent call last):
  File "phase-Extender.py", line 1806, in <module>
    main()
  File "phase-Extender.py", line 502, in main
    compute_haplotype_stats(initial_haplotype, soi, prefix='initial')
  File "phase-Extender.py", line 1719, in compute_haplotype_stats
    ax[i].hist(data_i, label=str(data['CHROM']), alpha=0.5)
TypeError: 'AxesSubplot' object does not support indexing

如何解决这两个问题？

Answer 1

您的第一个问题来自于您在循环结束时使用plt.ylabel()这一事实。 pyplot函数作用于当前活动轴对象，在这种情况下，它是由subplots()创建的最后一个对象。如果您希望标签在子图上居中，最简单的方法可能是在图中垂直居中创建文本对象。

# replace plt.ylabel('frequency of the haplotypes') with:
fig.text(.02, .5, 'frequency of the haplotypes', ha='center', va='center', rotation='vertical')

你可以使用x位置（0.02），直到找到你满意的位置。坐标在图坐标中，（0,0）在左下方（1,1）在右上方。使用0.5作为y位置可确保标签在图中居中。

第二个问题是由于numrows=1 plt.subplots()直接返回轴对象而不是轴列表的事实。有两种方法可以解决这个问题

1 - 测试您是否只有一行，然后用列表替换ax：

fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True)
if len(hap_stats)==1:
    ax = [ax]
(...)

2 - 在致电squeeze=False时使用plt.subplots()选项。 As explained in the documentation，使用此选项会强制subplots()始终返回 2D 数组。因此，您必须修改一下轴索引的位置：

fig, ax = plt.subplots(nrows=len(hap_stats), sharex=True, squeeze=False)
    for i, data in hap_stats.iterrows():
        (...)
        ax[i,0].hist(data_i, label=str(data['CHROM']), alpha=0.5)
        (...)

使用matpolot库在pandas数据帧中制作直方图的子图？

1 个答案: