我正在尝试编写一个计算直方图的python程序,给出一个数字列表,如:
1
3
2
3
4
5
3.2
4
2
2
所以输入参数是文件名和间隔数。
程序代码是:
#!/usr/bin/env python
import os, sys, re, string, array, math
import numpy
Lista = []
db = sys.argv[1]
db_file = open(db,"r")
ic=0
nintervals= int(sys.argv[2])
while 1:
line = db_file.readline()
if not line:
break
ll=string.split(line)
#print ll[6]
Lista.insert(ic,float(ll[0]))
ic=ic+1
lmin=min(Lista)
print "min= ",lmin
lmax=max(Lista)
print "max= ",lmax
width=666.666
width=(lmax-lmin)/nintervals
print "width= ",width
nelements=len(Lista)
print "nelements= ",nelements
print " "
Histogram = numpy.zeros(shape=(nintervals))
for item in Lista:
#print item
int_number = 1 + int((item-lmin)/width)
print " "
print "item,lmin= ",item,lmin
print "(item-lmin)/width= ",(item-lmin)," / ",width," ====== ",(float(item)-float(lmin))/float(width)
print "int((item-lmin)/width)= ",int((item-lmin)/width)
print item , " belongs to interval ", int_number, " which is from ", lmin+width*(int_number-1), " to ",lmin+width*int_number
Histogram[int_number] = Histogram[int_number] + 1
4
但不知怎的,我完全迷失了,我得到了奇怪的错误,任何人都可以帮忙
由于
Pd积。这些是输出的结果:
item,lmin= 1.0 1.0
(item-lmin)/width= 0.0 / 0.666666666667 ====== 0.0
int((item-lmin)/width)= 0
1.0 belongs to interval 1 which is from 1.0 to 1.66666666667
item,lmin= 2.0 1.0
(item-lmin)/width= 1.0 / 0.666666666667 ====== 1.5
int((item-lmin)/width)= 1
2.0 belongs to interval 2 which is from 1.66666666667 to 2.33333333333
item,lmin= 3.0 1.0
(item-lmin)/width= 2.0 / 0.666666666667 ====== 3.0
int((item-lmin)/width)= 3
3.0 belongs to interval 4 which is from 3.0 to 3.66666666667
Traceback (most recent call last):
File "from_list_to_histogram.py", line 43, in <module>
Histogram[int_number] = Histogram[int_number] + 1
IndexError: index out of bounds
最重要的错误是:
(item-lmin)/ width = 1.0 / 0.666666666667 ====== 1.5
和
IndexError:索引越界
答案 0 :(得分:1)
我认为这个问题可能是一个特殊的错误:
int_number = 1 + int((item-lmin)/width)
为什么1 +
?长度为N的数组上的Python索引包括0到N-1。这里1 +
使int_number从1变为1 + (lmax-lmin)/width
,即1 + nintervals
给定width
的公式,同时将Histogram
的大小调整为nintervals
} items - 所以它实际上是一个二分之一,被1 +
恶化了,但即使没有它也会存在(仅适用于lmax)。使间隔更宽,所以lmax落在最后一个而不仅仅是超出它,并且失去1 +
,事情可能会更好。
答案 1 :(得分:1)
这是一种更多的Pythonic方法。
from itertools import groupby
from math import floor
data = [1,3,2,3,4,5,3.2,4,2,2,3.6]
data.sort()
nintervals = 3
lmax = max(data)
lmin = min(data)
width = 1.0*(lmax-lmin)/nintervals
def grouper(item):
return floor(1.0*(item-lmin)/width)
for i, b in groupby(data, grouper):
print '%.3f <= i < %.3f ' %(lmin + i * width, lmin + (i+1) * width), list(b)
答案 2 :(得分:0)
在最后一行,您访问索引太大的直方图。你应该确保'int_number'最多是
len(Histogram) - 1
可能存在一个导致此问题的错误。
答案 3 :(得分:0)
我刚删除了从文件加载的代码并重写为更易读的内容
from math import floor
Lista = [1,3,2,3,4,5,3.2,4,2,2]
ic=0
nintervals= 3
lmin=min(Lista)
print "min= ",lmin
lmax=max(Lista)
print "max= ",lmax
width=1.0*(lmax-lmin)/nintervals
print "width= ",width
nelements=len(Lista)
print "nelements= ",nelements
print " "
histogram =[0]*nintervals
for item in Lista:
ind = int(floor(1.0*(item-lmin)/width))
if ind==nintervals:
ind=ind-1
histogram[ind]+=1
for i,v in enumerate(histogram):
print "from", lmin+i*width, "to", lmin+(i+1)*width, "are",v,"values"
for i,v in enumerate(histogram):
print "Visual presentation:","="*int(round(v*40.0/lmax))