python-lzw不会解压缩较大的Blob

时间:2018-07-19 01:43:47

标签: python lzw

我是python的新手,我们一直试图在程序中使用GIT中的lzw代码。 https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py

如果我们的blob较小,则效果很好,但如果blob大小增加,则不会解压缩blob。因此,我一直在阅读文档,但无法理解以下内容,这可能是完整的Blob无法解压缩的原因。

我还附上了我正在使用的python代码带。

Our control codes are
    - CLEAR_CODE (codepoint 256). When this code is encountered, we flush
      the codebook and start over.
    - END_OF_INFO_CODE (codepoint 257). This code is reserved for
      encoder/decoders over the integer codepoint stream (like the
      mechanical bit that unpacks bits into codepoints)
When dealing with bytes, codes are emitted as variable
length bit strings packed into the stream of bytes.
codepoints are written with varying length
    - initially 9 bits
    - at 512 entries 10 bits
    - at 1025 entries at 11 bits
    - at 2048 entries 12 bits
    - with max of 4095 entries in a table (including Clear and EOI)
code points are stored with their MSB in the most significant bit
available in the output character.

我的代码带:

def decompress_without_eoi(buf):
    # Decompress LZW into a bytes, ignoring End of Information code
    def gen():
        try:
            for byte in lzw.decompress(buf):
                yield byte
        except ValueError as exc:
            #print(repr(exc))
            if 'End of information code' in repr(exc):
                #print('Ignoring EOI error..\n')
                pass
            else:
                raise
            return
    try:
        #print('Trying a join..\n')
        deblob = b''.join(gen())
    except Exception as exc2:
        #print(repr(exc2))
        #print('Trying byte by byte..')
        deblob=[]

        try:
            for byte in gen():
                deblob.append(byte)
        except Exception as exc3:
            #print(repr(exc3))
            return b''.join(deblob)
    return deblob
     #current function to deblob
     def deblob3(row):
    if pd.notnull(row[0]):
        blob = row[0]

        h = html2text.HTML2Text()
        h.ignore_links=True
        h.ignore_images = True #zzzz


        if type(blob) != bytes:
            blobbytes = blob.read()[:-10]
        else:
            blobbytes = blob[:-10]

        if row[1]==361:
            # If compressed, return up to EOI-257 code, which is last non-null code before tag
       #     print (row[0])
            return h.handle(striprtf(decompress_without_eoi(blobbytes)))
        elif row[1]==360:
            # If uncompressed, return up to tag
            return h.handle(striprtf(blobbytes))

已按以下方式调用此功能

nf['IS_BLOB'] = nf[['IS_BLOB','COMPRESSION']].apply(deblob3,axis=1)

0 个答案:

没有答案