我试图基于此paper实现基于熵的直方图(PDF警告!),但最大熵算法伪代码根本不清楚。
他们正在计算"区域"做ai = fi * si的一个桶(第4.2节),其中fi是i值的频率,si是" spread"。第一个问题是我不确定传播是什么,但根据参考文献[16]它应该是(v_(i + 1) - v_i),这意味着"与下一个项目的距离"
然而,在算法上,他们使用Ab作为line7上的区域列表(例如),并将其作为第14行上的频率(例如)。所以不清楚ab是否是一个区域,频率列表,或者它们是否只是随机交换它们......
你能帮我清除伪代码吗?我在python中完成了一个实现,但是它没有工作,我得到localMinH
的负值:
def build_struct(self):
self.conn = sqlite3.connect(self.db)
self.cursor = self.conn.cursor()
self.calculateFrequency()
self.calculateArea()
self.splits = []
self.entropies = {}
minHeap = [(self.H(self.frequency.keys()), 0, len(self.frequency), 0)]
while len(self.splits) < self.parameter:
previous = minHeap
minHeap = []
for bucket in previous:
a = bucket[1]
b = bucket[2]
wb = sum(self.areas[a:b])
if wb > 1:
tr = sum(self.frequency.keys()[a:b])
locCutPos = tl = hl = 0
localMinH = -1
hr = ho = self.H(self.frequency.keys()[a:b])
for j in xrange(len(self.areas[a:b]) - 1):
x = self.frequency[self.frequency.keys()[j+a]]
tl += x
tr -= x
hl = self.H2(x/(x+tl), tl/(x+tl)) + (tl/(tl+x))*hl
#print a,b,x,tr
hr = (hr - self.H2(x/tr, (tr-x)/tr))* (tr/(tr-x))
hmenos = ho - (hl + hr)
if (localMinH == -1) or (hmenos < localMinH):
locCutPos = j
localMinH = hmenos
print wb, localMinH
heapq.heappush(minHeap, (wb*localMinH, a, a+locCutPos, locCutPos))
heapq.heappush(minHeap, (wb*localMinH, a+locCutPos, b, locCutPos))
bucket = minHeap[0]
self.splits.append(bucket[1] + bucket[3])
self.entropies[bucket[1] + bucket[3]] = bucket[0]
self.frequency
是一个dict,其值为键,频率为value。和self.areas只是一个区域列表。