Question

我正在使用Python 3，而pdfminer（.six）出现了一个奇怪的问题。我在这个问题中使用类似于OP的代码： Python pdfminer extract image produces multiple images per page (should be single image)

#ltimageobject is an object obtained from the previous script
from PIL import image

im =Image.frombytes(mode="1",\
                    data = ltimageobject.stream.data,\
                    size = ltimageobject.srcsize,\
                    decoder_name='raw') 
Workingmethod = im.save("new_fromIm.jpg")
#ltimageobject.stream.get_data() would also work to obtain data.
#get_rawdata() or rawdata won't work.

但是我无法从字节文件中写入数据。

with open("new_decoded.jpg", "wb") as new:
    new.write(im.tobytes(encoder_name="raw"))
    #this doesn't work
with open("new_decoded2.jpg", "w", encoding=codecs.open("raw")) as new:
    new.write(av)
    #this doesn't work either

文件略短（1215字节，而不是1247字节），我想其中包含了所有内容，将图像包装在缺少字节的位置。我精确地发现，使用以下代码段可以找到正确的模式：

encoders = Image.ENCODERS
print(encoders)
for mode in encoders:
    try:
        print(Image.frombytes(mode=mode, data = savelt.stream.data, size = savelt.srcsize,\
                      decoder_name='raw'))
        except Exception as x:
        #print("fail on mode {}: {}".format(mode, x))
        continue

请注意 ltimageobject.srcsize = / =（int（ltimageobject.height，int（ltimageobject.width））

想要使用字节文件写入时，我在做什么错？那应该是快速而肮脏的方法，但是应该可以。我精确地说，在数据开头找不到魔术数字b“ \ xff \ xd8”。

保存pdfminer提取的LTimages

0 个答案: