基于熵的数据库选择性直方图

时间:2014-06-12 14:37:37

标签: python database relational-database histogram entropy

我试图基于此paper实现基于熵的直方图(PDF警告!),但最大熵算法伪代码根本不清楚。

他们正在计算"区域"做ai = fi * si的一个桶(第4.2节),其中fi是i值的频率,si是" spread"。第一个问题是我不确定传播是什么,但根据参考文献[16]它应该是(v_(i + 1) - v_i),这意味着"与下一个项目的距离"

然而,在算法上,他们使用Ab作为line7上的区域列表(例如),并将其作为第14行上的频率(例如)。所以不清楚ab是否是一个区域,频率列表,或者它们是否只是随机交换它们......

你能帮我清除伪代码吗?我在python中完成了一个实现,但是它没有工作,我得到localMinH的负值:

def build_struct(self):
    self.conn = sqlite3.connect(self.db)
    self.cursor = self.conn.cursor()
    self.calculateFrequency()
    self.calculateArea()
    self.splits = []
    self.entropies = {}
    minHeap = [(self.H(self.frequency.keys()), 0, len(self.frequency), 0)]
    while len(self.splits) < self.parameter:
        previous = minHeap
        minHeap = []
        for bucket in previous:
            a = bucket[1]
            b = bucket[2]
            wb = sum(self.areas[a:b])
            if wb > 1:
                tr = sum(self.frequency.keys()[a:b])
                locCutPos = tl = hl = 0
                localMinH = -1
                hr = ho = self.H(self.frequency.keys()[a:b])
                for j in xrange(len(self.areas[a:b]) - 1):
                    x = self.frequency[self.frequency.keys()[j+a]]
                    tl += x
                    tr -= x
                    hl = self.H2(x/(x+tl), tl/(x+tl)) + (tl/(tl+x))*hl
                    #print a,b,x,tr
                    hr = (hr - self.H2(x/tr, (tr-x)/tr))* (tr/(tr-x))
                    hmenos = ho - (hl + hr)
                    if (localMinH == -1) or (hmenos < localMinH):
                        locCutPos = j
                        localMinH = hmenos
                print wb, localMinH
                heapq.heappush(minHeap, (wb*localMinH, a, a+locCutPos, locCutPos))
                heapq.heappush(minHeap, (wb*localMinH, a+locCutPos, b, locCutPos))
        bucket = minHeap[0]
        self.splits.append(bucket[1] + bucket[3])
        self.entropies[bucket[1] + bucket[3]] = bucket[0]

self.frequency是一个dict,其值为键,频率为value。和self.areas只是一个区域列表。

0 个答案:

没有答案