Question

是否有可能在numpy（或scipy）中检索直方图的每个bin中的平方权重之和？我希望直方图中的每个箱高都有错误。对于未加权的数据，每个箱高度的统计误差应该是sqrt（N），其中N是箱高度。但是对于加权数据，我需要加权平方和。 <div id="wrapper"></div> <script> var container = document.getElementById("wrapper"); var array = ["../Images/stu_9.png", "../Images/stu_9.png", "../Images/stu_9.png", "../Images/stu_9.png"]; for( i=0; i<array.length; i++){ container.insertAdjacentHTML('beforeend', '<img src="'+array[i]+'">'); } </script> </head> <body> <div id="wrapper"></div>无法执行此操作，但是numpy或scipy中还有一些其他功能可以基于不同的数组（例如数组）对数组（例如权重数组）进行分类值I＆＃39; m histogramming）？我已经仔细阅读了文档，但还没有发现任何内容。

Answer 1

正如亚历克斯所说，numpy.digitize就是你想要的。该函数返回# write the initial file with open('test.json', 'w') as f: json.dump(OneDayWeather_data, f) # open the initial file and attempt to append with open('test.json','r+') as f: dic = dict(json.load(f)) dic.update(OneDayWeather_data) json.dump(dic, f) # reopen the appended file with open('test.json', 'r') as f2: json_object = json.load(f2)数组的条目属于哪个bin。然后，您可以使用此信息访问x的正确元素：

然后最后一行计算第一个bin的错误。请注意x = np.array([2,9,4,8]) w = np.array([0.1,0.2,0.3,0.4]) bins = np.digitize(x, [0,5,10]) # access elements for first bin first_bin_ws = w[np.where(bins==1)[0]] # error of fist bin error = np.sqrt(np.sum(first_bin_ws**2.))开始计为1。

Answer 2

如果我可以为@ obachtos的答案添加补充，我已经将它扩展为一个函数，然后演示完整的直方图：

def hist_bin_uncertainty(data, weights, bin_edges):
    """
    The statistical uncertainity per bin of the binned data.
    If there are weights then the uncertainity will be the root of the
    sum of the weights squared.
    If there are no weights (weights = 1) this reduces to the root of
    the number of events.

    Args:
        data: `array`, the data being histogrammed.
        weights: `array`, the associated weights of the `data`.
        bin_edges: `array`, the edges of the bins of the histogram.

    Returns:
        bin_uncertainties: `array`, the statistical uncertainity on the bins.

    Example:
    >>> x = np.array([2,9,4,8])
    >>> w = np.array([0.1,0.2,0.3,0.4])
    >>> edges = [0,5,10]
    >>> hist_bin_uncertainty(x, w, edges)
    array([ 0.31622777,  0.4472136 ])
    >>> hist_bin_uncertainty(x, None, edges)
    array([ 1.41421356,  1.41421356])
    >>> hist_bin_uncertainty(x, np.ones(len(x)), edges)
    array([ 1.41421356,  1.41421356])
    """
    import numpy as np
    # Bound the data and weights to be within the bin edges
    in_range_index = [idx for idx in range(len(data))
                      if data[idx] > min(bin_edges) and data[idx] < max(bin_edges)]
    in_range_data = np.asarray([data[idx] for idx in in_range_index])

    if weights is None or np.array_equal(weights, np.ones(len(weights))):
        # Default to weights of 1 and thus uncertainty = sqrt(N)
        in_range_weights = np.ones(len(in_range_data))
    else:
        in_range_weights = np.asarray([weights[idx] for idx in in_range_index])

    # Bin the weights with the same binning as the data
    bin_index = np.digitize(in_range_data, bin_edges)
    # N.B.: range(1, bin_edges.size) is used instead of set(bin_index) as if
    # there is a gap in the data such that a bin is skipped no index would appear
    # for it in the set
    binned_weights = np.asarray(
        [in_range_weights[np.where(bin_index == idx)[0]] for idx in range(1, len(bin_edges))])
    bin_uncertainties = np.asarray(
        [np.sqrt(np.sum(np.square(w))) for w in binned_weights])
    return bin_uncertainties

numpy.histogram：检索每个bin中平方的权重总和

2 个答案: