我正在使用Python 3,而pdfminer(.six)出现了一个奇怪的问题。 我在这个问题中使用类似于OP的代码: Python pdfminer extract image produces multiple images per page (should be single image)
#ltimageobject is an object obtained from the previous script
from PIL import image
im =Image.frombytes(mode="1",\
data = ltimageobject.stream.data,\
size = ltimageobject.srcsize,\
decoder_name='raw')
Workingmethod = im.save("new_fromIm.jpg")
#ltimageobject.stream.get_data() would also work to obtain data.
#get_rawdata() or rawdata won't work.
但是我无法从字节文件中写入数据。
with open("new_decoded.jpg", "wb") as new:
new.write(im.tobytes(encoder_name="raw"))
#this doesn't work
with open("new_decoded2.jpg", "w", encoding=codecs.open("raw")) as new:
new.write(av)
#this doesn't work either
文件略短(1215字节,而不是1247字节),我想其中包含了所有内容,将图像包装在缺少字节的位置。 我精确地发现,使用以下代码段可以找到正确的模式:
encoders = Image.ENCODERS
print(encoders)
for mode in encoders:
try:
print(Image.frombytes(mode=mode, data = savelt.stream.data, size = savelt.srcsize,\
decoder_name='raw'))
except Exception as x:
#print("fail on mode {}: {}".format(mode, x))
continue
请注意 ltimageobject.srcsize = / =(int(ltimageobject.height,int(ltimageobject.width))
想要使用字节文件写入时,我在做什么错? 那应该是快速而肮脏的方法,但是应该可以。 我精确地说,在数据开头找不到魔术数字b“ \ xff \ xd8”。