保存pdfminer提取的LTimages

时间:2019-06-20 09:21:07

标签: python image image-processing python-imaging-library pdfminer

我正在使用Python 3,而pdfminer(.six)出现了一个奇怪的问题。 我在这个问题中使用类似于OP的代码: Python pdfminer extract image produces multiple images per page (should be single image)

#ltimageobject is an object obtained from the previous script
from PIL import image

im =Image.frombytes(mode="1",\
                    data = ltimageobject.stream.data,\
                    size = ltimageobject.srcsize,\
                    decoder_name='raw') 
Workingmethod = im.save("new_fromIm.jpg")
#ltimageobject.stream.get_data() would also work to obtain data.
#get_rawdata() or rawdata won't work.

但是我无法从字节文件中写入数据。

with open("new_decoded.jpg", "wb") as new:
    new.write(im.tobytes(encoder_name="raw"))
    #this doesn't work
with open("new_decoded2.jpg", "w", encoding=codecs.open("raw")) as new:
    new.write(av)
    #this doesn't work either

文件略短(1215字节,而不是1247字节),我想其中包含了所有内容,将图像包装在缺少字节的位置。 我精确地发现,使用以下代码段可以找到正确的模式:

encoders = Image.ENCODERS
print(encoders)
for mode in encoders:
    try:
        print(Image.frombytes(mode=mode, data = savelt.stream.data, size = savelt.srcsize,\
                      decoder_name='raw'))
        except Exception as x:
        #print("fail on mode {}: {}".format(mode, x))
        continue

请注意 ltimageobject.srcsize = / =(int(ltimageobject.height,int(ltimageobject.width))

想要使用字节文件写入时,我在做什么错? 那应该是快速而肮脏的方法,但是应该可以。 我精确地说,在数据开头找不到魔术数字b“ \ xff \ xd8”。

0 个答案:

没有答案