下面的Python函数是计算数据的直方图,具有相等大小的bin。我想得到正确的结果
[1, 6, 4, 6]
然而,在我运行代码之后,它得到了结果
[7, 12, 17, 17]
这是不正确的。有谁知道如何解决它?
# Computes the histogram of a set of data
def histogram(data, num_bins):
# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / num_bins
# Calculate the thresholds for each bin.
thresholds = [0] * num_bins
for i in range(num_bins):
thresholds[i] += bin_size * (i+1)
# Compute the histogram
counts = [0] * num_bins
for datum in data:
# Increment the count of the bin that the datum falls in
for bin_index, threshold in enumerate(thresholds):
if datum <= threshold:
counts[bin_index] += 1
return counts
# Some random data
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))
答案 0 :(得分:5)
如果你想找到直方图使用numpy
import numpy as np
np.histogram([-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9],4)
答案 1 :(得分:3)
只有两个逻辑错误
(1)计算阈值
(2)添加中断,一旦找到范围
def histogram(data, num_bins):
span = max(data) - min(data)
bin_size = float(span) / num_bins
thresholds = [0] * num_bins
for i in range(num_bins):
#I change thresholds calc
thresholds[i] = min(data) + bin_size * (i+1)
counts = [0] * num_bins
for datum in data:
for bin_index, threshold in enumerate(thresholds):
if datum <= threshold:
counts[bin_index] += 1
#I add a break
break
return counts
data = [-3.2, 0, 1, 1.5, 1.6, 1.9, 5, 6, 9, 1, 4, 5, 8, 9, 5, 6.7, 9]
print("Correct result:\t" + str([1, 6, 4, 6]))
print("Your result:\t" + str(histogram(data, num_bins=4)))
答案 2 :(得分:1)
检查阈值定义和if语句。 这有效:
def histogram(data, num_bins):
# Find what range the data spans, and use it to calculate the bin size.
span = max(data) - min(data)
bin_size = span / float(num_bins)
# Calculate the thresholds for each bin.
thresholds = [0 for i in range(num_bins+1)]
for i in range(num_bins):
thresholds[i] += bin_size * (i)
print thresholds
# Compute the histogram
counts = [0 for i in range(num_bins)]
for datum in data:
# Increment the count of the bin that the datum falls in
for bin_index, threshold in enumerate(thresholds):
if thresholds[bin_index-1] <= datum <= threshold:
counts[bin_index] += 1
return counts
答案 3 :(得分:1)
首先,如果只是想对你的数据进行直方图,那么numpy会提供这个。但是,你问自己如何做到这一点。你的代码暗示你忘记了你想要做的事情,所以将你的功能分解成更小的功能。例如,要计算阈值,请编写函数thresholds(xmin, xmax, nbins)
,或者更好地使用numpy.linspace
。如果您假设相对于0
(而不是min(data)
)递增,这将引起您注意所出现的问题,如果您幸运,可能会提醒您不要希望确切的浮点数积累。所以你可能最终得到
def thresholds(xmin, xmax, nbins):
span = (xmax - xmin) / float(nbins)
thresholds = [xmin + (i+1)*span for i in range(nbins)]
thresholds[-1] = xmax
return thresholds
接下来,您需要获取bin计数。同样,您可以使用numpy.digitize
。与代码相比,重要的是不要增加多个bin。最后你可能会得到像
def counts(data, bounds):
counts = [0] * len(bounds)
for datum in data:
bin = min(i for i,bound in enumerate(bounds) if bound >= datum)
counts[bin] += 1
return counts
现在你准备好了:
def histogram02(data, num_bins):
xmin = min(data)
xmax = max(data)
th = thresholds(xmin, xmax, num_bins)
return counts(data, th)