我的lempel zip实现使编码更长

时间:2017-01-04 00:19:09

标签: python compression

我无法理解为什么我的实现创建比输入更长的字符串。

它是根据本文档中的描述而仅根据该描述实现的。

它只是设计为仅对二进制字符串起作用。如果有人能够阐明为什么这会产生一个比它开始时更长的字符串,我会非常感激!

主要编码

def LZ_encode(uncompressed):
    m=uncompressed
    dictionary=dict_gen(m)
    list=[int(bin(i)[2:]) for i in range(1,len(dictionary))]
    pointer_bit=[]
    for k in list:
        pointer_bit=pointer_bit+[(str(chopped_lookup(k,dictionary)),dictionary[k][-1])]
    new_pointer_bit=pointer_length_correct(pointer_bit)
    list_output=[i for sub in new_pointer_bit for i in sub]
    if list_output[-1]=='$':
        output=''.join(list_output[:-1])
    else:
        output=''.join(list_output)
    return output

组件功能

def dict_gen(m): # Generates Dictionary
    dictionary={0:""}
    j=1
    w=""
    iterator=0
    l=len(m)
    for c in m:
        iterator+=1
        wc= str(str(w) + str(c))
        if wc in dictionary.values():
            w=wc
            if iterator==l:
                dictionary.update({int(bin(j)[2:]): wc+'$'})
        else:
            dictionary.update({int(bin(j)[2:]): wc})
            w=""
            j+=1
    return dictionary

def chopped_lookup(k,dictionary): # Returns entry number of shortened source string
    cut_source_string=dictionary[k][:-1]
    for key, value in dictionary.iteritems():
        if value == cut_source_string:
            return key
def pointer_length_correct(lst): # Takes the (pointer,bit) list and corrects the lenth of the pointer
    new_pointer_bit=[]
    for pair in lst:
        n=lst.index(pair)
        if len(str(pair[0]))>ceil(log(n+1,2)):
            while len(str(pair[0]))!=ceil(log(n+1,2)):
                pair = (str(pair[0])[1:],pair[1])
        if len(str(pair[0]))<ceil(log(n+1,2)):
            while len(str(pair[0]))!=ceil(log(n+1,2)):
                pair = (str('0'+str(pair[0])),pair[1])
        new_pointer_bit=new_pointer_bit+[pair]
    return new_pointer_bit

0 个答案:

没有答案