Question

让我们说我有两个数据集，然后以一定的权重绘制两个数据集的堆叠直方图。现在，我是否可以知道大于一定数量的数据元素的总bin计数是多少（即大于特定值的x坐标）。为了说明我的问题，我已经完成了以下

$("#sub-accordion-panel-frontpage_panel").sortable({
  items: "> .control-section-kirki-default",
  axis : "y",
  cursor: "move",
  update: function(){
    $(this).trigger("change");
  }
});

enter image description here

现在，我如何知道垃圾箱计数，例如import matplotlib.pyplot as plt import numpy as np data1 = np.random.normal(0,0.6,1000) data2 = np.random.normal(0,1.4,1000) weight1 = np.array([0.5]*len(data1)) weight2 = np.array([0.9]*len(data2)) hist = plt.hist((data1,data2),weights=(weight1,weight2),stacked=True,range=(-5,5)) plt.show()大于-2？

到目前为止，为了得到答案，我正在做以下事情

在这里，我将范围内的最大值选择为一个非常大的数字，以便获得n1,_,_ = plt.hist((data1,data2),weights=(weight1,weight2),stacked=False,range=(-2,10000)) bin_counts=sum(sum(n1)) print(bin_counts)及更高的所有bin计数。

有没有比这更有效的方法了？

此外，为变量x=-2获取bin_counts的方式是什么，其中x从x坐标的最小值到x的最大值一些步骤？

任何帮助将不胜感激！

非常感谢！

Answer 1

您可以执行以下操作：

#in your case n is going to be a list of arrays, because you have 2 histograms
n,bins,_ = plt.hist(...)
#get a list of lists of counts for bin values over x
n_over_x = [[val for val,bin in zip(selected_cnt, bins) if bin > x] for selected_cnt in n]
#sum up list of lists
result = sum([sum(part_list) for part_list in n_over_x])

Answer 2

这是我想出的，

def my_range(start, end, step):
    while start <= end:
        yield start
        start += step

b_counts=[0]*len(data1) #here b_counts is the normalized events (i mean normalized according to the weights)
value=[0]*len(data1)
bin_min=-5
bin_max=10
bin_step=1
count_max = (bin_max-bin_min)/bin_step

for i in my_range(bin_min,count_max,1):
    n1,_,_ = plt.hist((data1,data2),weights=(weight1,weight2),stacked=False,range=(i*bin_step,10000))
    b_counts[i] = sum(sum(n1))
    value[i] = i*bin_step #here value is exactly equal to "i", but I am writing this for a general case
    print(b_counts[i],value[I])

我确实相信这会给我（直方图）在（值，10000）范围内的事件，其中值是变量

x坐标大于特定值的堆叠直方图（加权）中的bin计数

2 个答案: