Question

我是matplotlib的新手并尝试绘制直方图。我对较低的bin范围感兴趣，因此拆分了我的bin范围，但它看起来有点难看，右侧有很多空白区域。我有一些代码生成这个直方图，但我想改变它：

将x轴位置150后的所有条形组合为150+，以便更好地显示较低范围的条形。
更改栏的颜色
最左边的栏以不同的颜色
表示5-40之间的x轴刻度的条具有不同的颜色

40多种不同颜色的酒吧

import matplotlib
matplotlib.use('PS')
import matplotlib.pyplot as plt
# sample data, These are not actual values since I have a large csv file  
# with 1000's of rows.
values=[1,1,1,1,1,1,1,2,2,2,2,4,4,4,5,6,7,8,9,10,111,12,23,30,30,35,353,35,25,25,25,15,15,15,20,20,20,40,40,40,45,50,55,50,50,100,200,300,400]

limit1, limit2 = 50, 500
binwidth1, binwidth2 = 5, 100
binr=list(range(0, limit1, binwidth1)) + list(range(100, limit2, binwidth2))
n, bins, patches=plt.hist(values, bins = binr)
one, fifty = np.percentile(values, [0.5,50])
for patch, rightside, leftside in zip(patches, bins[1:], bins[:-1]):
    if rightside < one:
        patch.set_facecolor('green')
    elif leftside > fifty:
        patch.set_facecolor('red')
plt.title("Frequency Histogram")
plt.xlabel("Word Count")
plt.ylabel("Frequency")
plt.savefig(plot_file)
plt.close()

Answer 1

我并不完全清楚你正在尝试做什么，特别是你的期望似乎与你的简单例子有点矛盾（我的意思是基于百分位数的着色）。

无论如何，我建议您直接使用np.histogram（因为您已经导入了numpy），并手动调用plt.bar。这个的主要优点（除了更好地控制输出，代价是一些微小的努力增加）是你可以传递一个包含每个条形颜色的列表。

您示例的修改版本：

values=[1,1,1,1,1,1,1,2,2,2,2,4,4,4,5,6,7,8,9,10,111,12,23,30,30,35,353,35,25,25,25,15,15,15,20,20,20,40,40,40,45,50,55,50,50,100,200,300,400]

limit1, limit2 = 50, 500
binwidth1, binwidth2 = 5, 100
binr=list(range(0, limit1, binwidth1)) + list(range(100, limit2, binwidth2))

# improvement 1: merge bins above 150, keep the same maximum
thresh = 150
# keep the first value after the threshold too
binr_tmp = [val for val in binr if val<=thresh] 
binr = binr_tmp + [binr[len(binr_tmp)], binr[-1]]

# improvement 2: use np.histogram explicitly, feed into plt.bar later (for colors)
bin_vals, bins = np.histogram(values, bins=binr)
bins_left = binr[:-1]
bins_width = np.diff(bins)
bins_right = bins_left + bins_width
one, fifty = np.percentile(values, [0.5,50])

# "change the color of bars": you did the same thing earlier
# improvement: use a numpy.array for a colour list, set for each bar separately
# (possibility for array indexing)
# just don't forget to turn into a list() when calling plt.bar
bins_color = np.array(['blue']*len(bins_left), dtype=object)
bins_color[bins_left>fifty] = 'red'
bins_color[bins_left+bins_width<one] = 'green'

# "leftmost bar to a different color":
bins_color[0] = 'magenta'

# "bars from 40+ different color": would conflict with percentile-based original version
thresh2 = 40
#bins_color[bins_right>thresh2] = 'olive'

hbars = plt.bar(left=bins_left, height=bin_vals, width=bins_width, color=list(bins_color))
plt.title("Frequency Histogram")
plt.xlabel("Word Count")
plt.ylabel("Frequency")
#plt.savefig(plot_file)
#plt.close()
plt.show()

我试图留下内容丰富的评论。需要注意的主要事项是np.histogram生成bin值，这些值被输入plt.bar。与plt.hist相比，后者的输入更复杂（特别是，每个条的左侧和右侧必须手动指定），但这也允许更大的自定义。

正如您在＆＃34;改进1＆＃34;中看到的那样，我将您的垃圾箱合并到thresh值以上，而其他垃圾箱保持不变。我知道您要求这样做，以便为x<50地区留出更多空间。您可以通过手动移动binr的最后一个值来拉动您的最后一个（合并）栏来实现此目的。如果这样做，您应该使用plt.xlabel在x轴上指示这一点。

我不这样做的原因是这样的操作会严重扭曲您的数据，导致很多偏见。你通常应该避免这样做。如果您打算在视觉上歪曲酒吧并且对此感到满意，那就按照我在前一段中所做的那样去做。

我包括上述结果，当然，与原版相比，差异并不大。但是，我相信通过引入bins_color array，您想要做的大多数操作都会更容易。

matplotlib结合较低的箱子

1 个答案: