Python-计算一组数据的直方图

时间:2015-04-30 12:57:30

标签: python histogram

下面的Python函数是计算数据的直方图,具有相等大小的bin。我想得到正确的结果

[1, 6, 4, 6]

然而,在我运行代码之后,它得到了结果

[7, 12, 17, 17]

这是不正确的。有谁知道如何解决它?

# Computes the histogram of a set of data
def histogram(data, num_bins):

# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / num_bins

# Calculate the thresholds for each bin.
thresholds = [0] * num_bins
for i in range(num_bins):
    thresholds[i] += bin_size * (i+1)

# Compute the histogram
counts = [0] * num_bins
for datum in data:
    # Increment the count of the bin that the datum falls in
    for bin_index, threshold in enumerate(thresholds):
        if datum <= threshold:
            counts[bin_index] += 1
return counts

# Some random data
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))

4 个答案:

答案 0 :(得分:5)

如果你想找到直方图使用numpy

import numpy as np
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4)

答案 1 :(得分:3)

只有两个逻辑错误

(1)计算阈值

(2)添加中断,一旦找到范围

def histogram(data, num_bins):
  span = max(data) - min(data)
  bin_size = float(span) / num_bins
  thresholds = [0] * num_bins

  for i in range(num_bins):
    #I change thresholds calc
    thresholds[i] = min(data) + bin_size * (i+1)

  counts = [0] * num_bins
  for datum in data:
    for bin_index, threshold in enumerate(thresholds):
      if datum <= threshold:
        counts[bin_index] += 1
        #I add a break
        break
  return counts

data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))

答案 2 :(得分:1)

检查阈值定义和if语句。 这有效:

def histogram(data, num_bins):

    # Find what range the data spans, and use it to calculate the bin size.
    span = max(data) - min(data)
    bin_size = span / float(num_bins)

    # Calculate the thresholds for each bin.
    thresholds = [0 for i in range(num_bins+1)]
    for i in range(num_bins):
        thresholds[i] += bin_size * (i)

    print thresholds
    # Compute the histogram
    counts = [0 for i in range(num_bins)]
    for datum in data:
        # Increment the count of the bin that the datum falls in
        for bin_index, threshold in enumerate(thresholds):
            if thresholds[bin_index-1] <= datum <= threshold:
                counts[bin_index] += 1
    return counts

答案 3 :(得分:1)

首先,如果只是想对你的数据进行直方图,那么numpy会提供这个。但是,你问自己如何做到这一点。你的代码暗示你忘记了你想要做的事情,所以将你的功能分解成更小的功能。例如,要计算阈值,请编写函数thresholds(xmin, xmax, nbins),或者更好地使用numpy.linspace。如果您假设相对于0(而不是min(data))递增,这将引起您注意所出现的问题,如果您幸运,可能会提醒您不要希望确切的浮点数积累。所以你可能最终得到

def thresholds(xmin, xmax, nbins):
    span = (xmax - xmin) / float(nbins)
    thresholds = [xmin + (i+1)*span for i in range(nbins)]
    thresholds[-1] = xmax
    return thresholds

接下来,您需要获取bin计数。同样,您可以使用numpy.digitize。与代码相比,重要的是不要增加多个bin。最后你可能会得到像

这样的东西
def counts(data, bounds):
    counts = [0] * len(bounds)
    for datum in data:
        bin = min(i for i,bound in enumerate(bounds) if bound >= datum)
        counts[bin] += 1
    return counts

现在你准备好了:

def histogram02(data, num_bins):
    xmin = min(data)
    xmax = max(data)
    th = thresholds(xmin, xmax, num_bins)
    return counts(data, th)