我经常需要提取已复制粘贴在Excel文件中的图像。 不幸的是,这些文件采用卑鄙的XLS格式。所以,由于简单的解压缩技巧不起作用,我决定尝试自己制作一个小的python脚本来实现它。
(提取图像很痛苦,因为我必须实际复制粘贴到Paint中以保存它们。没有另存为... 或导出按钮。)< / p>
如果您查看PNG参考(或已经知道它),您会看到它基本上以èPNG
标记开头,以IEND
块结束。
所以我尝试了以下代码:
import sys
import os
def info(s):
print("[i] "+s)
info("Opening file: " + sys.argv[1])
with open(sys.argv[1],'rb') as f:
buf = f.read()
info("File read")
offset_s = buf.find(b'\x89PNG\x0D\x0A\x1A\x0A')
if offset_s == -1:
error("PNG not found")
os.exit(-1)
else:
info("PNG start found at offset: {}".format(offset_s))
offset_e = buf.find(b'IEND')
if offset_e == -1:
error("PNG not found")
os.exit(-1)
else:
offset_e += 8
info("PNG end found at offset: {}".format(offset_e))
with open("out.png", "wb") as f:
f.write(buf[offset_s:offset_e])
info("Written to out.png")
因此它提取数据。但是PNG数据已损坏(在IDAT块中),因此无法正常显示。 以下是pngcheck运行的结果:
File: out.png (221879 bytes)
chunk IHDR at offset 0x0000c, length 0
1366 x 768 image, 24-bit RGB, non-interlaced
chunk sRGB at offset 0x00025, length 0
rendering intent = perceptual
chunk pHYs at offset 0x00032, length 0: 3780x3780 pixels/meter (96 dpi)
chunk IDAT at offset 0x00047, length 0
zlib: deflated, 32K window, fast compression
CRC error in chunk IDAT (actual 632dd60d, should be 5985ed29)
Chunk name fffffffb 02 ffffff8a 5e doesn't conform to naming rules.
chunk ?? at offset 0x10008, length 0
您认为(或了解事实? - 但我在尝试时未发现此信息)Excel存储具有特定(甚至专有)过滤器/压缩算法的PNG文件?
关于如何让它发挥作用的任何想法?
编辑 - 研究跟进:我一直在进行分析。我拍了一张更大的图像,把它放在一个空白的Excel文件中,并保存为XLS。
然后,我使用我之前的工具提取它,然后创建一个新工具来识别Excel添加的4字节项目。 这是代码:
import sys
import os
import binascii
def info(s):
print("[i] "+s)
def die(s):
print("[!] "+s)
sys.exit(-1)
info("Opening original file: " + sys.argv[1])
i = 0
with open(sys.argv[1], 'rb') as original:
info("Opening changed file: " + sys.argv[2])
with open(sys.argv[2], 'rb') as changed:
o_byte = original.read(1)
c_byte = changed.read(1)
while o_byte != b"":
if c_byte == b"":
die("Error reading from changed file.")
while c_byte != o_byte:
info("{:08X} - Found diff: 0x{:02X} 0x{:02X} 0x{:02X} 0x{:02X}".format(i, ord(c_byte), ord(changed.read(1)), ord(changed.read(1)), ord(changed.read(1))))
i += 4
c_byte = changed.read(1)
o_byte = original.read(1)
c_byte = changed.read(1)
i += 1
针对我的原始和XLS提取的png文件运行它,我得到以下输出:
[i] Opening original file: test1.PNG
[i] Opening changed file: out.png
[i] 00001FAB - Found diff: 0xEB 0x00 0x20 0x20
[i] 00003FCF - Found diff: 0x3C 0x00 0x20 0x20
[i] 00005FF3 - Found diff: 0x3C 0x00 0x20 0x20
[i] 00008017 - Found diff: 0x3C 0x00 0x20 0x20
[i] 000090BE - Found diff: 0x81 0x00 0x00 0x00
[i] 000090C2 - Found diff: 0x82 0x00 0x00 0x00
[i] 000090C6 - Found diff: 0x83 0x00 0x00 0x00
[i] 000090CA - Found diff: 0x84 0x00 0x00 0x00
[i] 000090CE - Found diff: 0x85 0x00 0x00 0x00
[i] 000090D2 - Found diff: 0x86 0x00 0x00 0x00
[i] 000090D6 - Found diff: 0x87 0x00 0x00 0x00
[i] 000090DA - Found diff: 0x88 0x00 0x00 0x00
[i] 000090DE - Found diff: 0x89 0x00 0x00 0x00
[i] 000090E2 - Found diff: 0x8A 0x00 0x00 0x00
[i] 000090E6 - Found diff: 0x8B 0x00 0x00 0x00
[i] 000090EA - Found diff: 0x8C 0x00 0x00 0x00
[i] 000090EE - Found diff: 0x8D 0x00 0x00 0x00
[i] 000090F2 - Found diff: 0x8E 0x00 0x00 0x00
[i] 000090F6 - Found diff: 0x8F 0x00 0x00 0x00
[i] 000090FA - Found diff: 0x90 0x00 0x00 0x00
[i] 000090FE - Found diff: 0x91 0x00 0x00 0x00
[i] 00009102 - Found diff: 0x92 0x00 0x00 0x00
[i] 00009106 - Found diff: 0x93 0x00 0x00 0x00
[i] 0000910A - Found diff: 0x94 0x00 0x00 0x00
[i] 0000910E - Found diff: 0x95 0x00 0x00 0x00
[i] 00009112 - Found diff: 0x96 0x00 0x00 0x00
[i] 00009116 - Found diff: 0x97 0x00 0x00 0x00
[i] 0000911A - Found diff: 0x98 0x00 0x00 0x00
[i] 0000911E - Found diff: 0x99 0x00 0x00 0x00
[i] 00009122 - Found diff: 0x9A 0x00 0x00 0x00
[i] 00009126 - Found diff: 0x9B 0x00 0x00 0x00
[i] 0000912A - Found diff: 0x9C 0x00 0x00 0x00
[i] 0000912E - Found diff: 0x9D 0x00 0x00 0x00
[i] 00009132 - Found diff: 0x9E 0x00 0x00 0x00
[i] 00009136 - Found diff: 0x9F 0x00 0x00 0x00
[i] 0000913A - Found diff: 0xA0 0x00 0x00 0x00
[i] 0000913E - Found diff: 0xA1 0x00 0x00 0x00
[i] 00009142 - Found diff: 0xA2 0x00 0x00 0x00
[i] 00009146 - Found diff: 0xA3 0x00 0x00 0x00
[i] 0000914A - Found diff: 0xA4 0x00 0x00 0x00
[i] 0000914E - Found diff: 0xA5 0x00 0x00 0x00
[i] 00009152 - Found diff: 0xA6 0x00 0x00 0x00
[i] 00009156 - Found diff: 0xA7 0x00 0x00 0x00
[i] 0000915A - Found diff: 0xA8 0x00 0x00 0x00
[i] 0000915E - Found diff: 0xA9 0x00 0x00 0x00
[i] 00009162 - Found diff: 0xAA 0x00 0x00 0x00
[i] 00009166 - Found diff: 0xAB 0x00 0x00 0x00
[i] 0000916A - Found diff: 0xAC 0x00 0x00 0x00
[i] 0000916E - Found diff: 0xAD 0x00 0x00 0x00
[i] 00009172 - Found diff: 0xAE 0x00 0x00 0x00
[i] 00009176 - Found diff: 0xAF 0x00 0x00 0x00
[i] 0000917A - Found diff: 0xB0 0x00 0x00 0x00
[i] 0000917E - Found diff: 0xB1 0x00 0x00 0x00
[i] 00009182 - Found diff: 0xB2 0x00 0x00 0x00
[i] 00009186 - Found diff: 0xB3 0x00 0x00 0x00
[i] 0000918A - Found diff: 0xB4 0x00 0x00 0x00
[i] 0000918E - Found diff: 0xB5 0x00 0x00 0x00
[i] 00009192 - Found diff: 0xB6 0x00 0x00 0x00
[i] 00009196 - Found diff: 0xB7 0x00 0x00 0x00
[i] 0000919A - Found diff: 0xB8 0x00 0x00 0x00
[i] 0000919E - Found diff: 0xB9 0x00 0x00 0x00
[i] 000091A2 - Found diff: 0xBA 0x00 0x00 0x00
[i] 000091A6 - Found diff: 0xBB 0x00 0x00 0x00
[i] 000091AA - Found diff: 0xBC 0x00 0x00 0x00
[i] 000091AE - Found diff: 0xBD 0x00 0x00 0x00
[i] 000091B2 - Found diff: 0xBE 0x00 0x00 0x00
[i] 000091B6 - Found diff: 0xBF 0x00 0x00 0x00
[i] 000091BA - Found diff: 0xC0 0x00 0x00 0x00
[i] 000091BE - Found diff: 0xC1 0x00 0x00 0x00
[i] 000091C2 - Found diff: 0xC2 0x00 0x00 0x00
[i] 000091C6 - Found diff: 0xC3 0x00 0x00 0x00
[i] 000091CA - Found diff: 0xC4 0x00 0x00 0x00
[i] 000091CE - Found diff: 0xC5 0x00 0x00 0x00
[i] 000091D2 - Found diff: 0xC6 0x00 0x00 0x00
[i] 000091D6 - Found diff: 0xC7 0x00 0x00 0x00
[i] 000091DA - Found diff: 0xC8 0x00 0x00 0x00
[i] 000091DE - Found diff: 0xC9 0x00 0x00 0x00
[i] 000091E2 - Found diff: 0xCA 0x00 0x00 0x00
[i] 000091E6 - Found diff: 0xCB 0x00 0x00 0x00
[i] 000091EA - Found diff: 0xCC 0x00 0x00 0x00
[i] 000091EE - Found diff: 0xCD 0x00 0x00 0x00
[i] 000091F2 - Found diff: 0xCE 0x00 0x00 0x00
[i] 000091F6 - Found diff: 0xCF 0x00 0x00 0x00
[i] 000091FA - Found diff: 0xD0 0x00 0x00 0x00
[i] 000091FE - Found diff: 0xD1 0x00 0x00 0x00
[i] 00009202 - Found diff: 0xD2 0x00 0x00 0x00
[i] 00009206 - Found diff: 0xD3 0x00 0x00 0x00
[i] 0000920A - Found diff: 0xD4 0x00 0x00 0x00
[i] 0000920E - Found diff: 0xD5 0x00 0x00 0x00
[i] 00009212 - Found diff: 0xD6 0x00 0x00 0x00
[i] 00009216 - Found diff: 0xD7 0x00 0x00 0x00
[i] 0000921A - Found diff: 0xD8 0x00 0x00 0x00
[i] 0000921E - Found diff: 0xD9 0x00 0x00 0x00
[i] 00009222 - Found diff: 0xDA 0x00 0x00 0x00
[i] 00009226 - Found diff: 0xDB 0x00 0x00 0x00
[i] 0000922A - Found diff: 0xDC 0x00 0x00 0x00
[i] 0000922E - Found diff: 0xDD 0x00 0x00 0x00
[i] 00009232 - Found diff: 0xDE 0x00 0x00 0x00
[i] 00009236 - Found diff: 0xDF 0x00 0x00 0x00
[i] 0000923A - Found diff: 0xE0 0x00 0x00 0x00
[i] 0000923E - Found diff: 0xE1 0x00 0x00 0x00
[i] 00009242 - Found diff: 0xE2 0x00 0x00 0x00
[i] 00009246 - Found diff: 0xE3 0x00 0x00 0x00
[i] 0000924A - Found diff: 0xE4 0x00 0x00 0x00
[i] 0000924E - Found diff: 0xE5 0x00 0x00 0x00
[i] 00009252 - Found diff: 0xE6 0x00 0x00 0x00
[i] 00009256 - Found diff: 0xE7 0x00 0x00 0x00
[i] 0000925A - Found diff: 0xE8 0x00 0x00 0x00
[i] 0000925E - Found diff: 0xE9 0x00 0x00 0x00
[i] 00009262 - Found diff: 0xEA 0x00 0x00 0x00
[i] 00009266 - Found diff: 0xEB 0x00 0x00 0x00
[i] 0000926A - Found diff: 0xEC 0x00 0x00 0x00
[i] 0000926E - Found diff: 0xED 0x00 0x00 0x00
[i] 00009272 - Found diff: 0xEE 0x00 0x00 0x00
[i] 00009276 - Found diff: 0xEF 0x00 0x00 0x00
[i] 0000927A - Found diff: 0xF0 0x00 0x00 0x00
[i] 0000927E - Found diff: 0xF1 0x00 0x00 0x00
[i] 00009282 - Found diff: 0xF2 0x00 0x00 0x00
[i] 00009286 - Found diff: 0xF3 0x00 0x00 0x00
[i] 0000928A - Found diff: 0xFE 0xFF 0xFF 0xFF
[i] 0000928E - Found diff: 0xFE 0xFF 0xFF 0xFF
[i] 00009292 - Found diff: 0xF6 0x00 0x00 0x00
[i] 00009296 - Found diff: 0xFE 0xFF 0xFF 0xFF
[i] 0000929A - Found diff: 0xFE 0xFF 0xFF 0xFF
[i] 0000929E - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092A2 - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092A6 - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092AA - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092AE - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092B2 - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092B6 - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 000092BA - Found diff: 0xFF 0xFF 0xFF 0xFF
[i] 0000A23B - Found diff: 0x3C 0x00 0x20 0x20
[i] 0000C25F - Found diff: 0x3C 0x00 0x20 0x20
[i] 0000E283 - Found diff: 0x3C 0x00 0x20 0x20
[i] 000102A7 - Found diff: 0x3C 0x00 0x20 0x20
[i] 000122CB - Found diff: 0x3C 0x00 0x20 0x20
[i] 000142EF - Found diff: 0x3C 0x00 0x20 0x20
[i] 00016313 - Found diff: 0x3C 0x00 0x20 0x20
[i] 00018337 - Found diff: 0x3C 0x00 0x20 0x20
[i] 0001A35B - Found diff: 0x3C 0x00 0x0D 0x0B
这个0x3C
家伙到底是谁?为什么Excel会在某个时刻开始计算? (0x81
,0x82
,0x83
...)
编辑 - 其他指示:似乎0x003C
是Excel文件格式的CONTINUE
记录的标识符,如https://www.openoffice.org/sc/excelfileformat.pdf
计数可能是复合文档 SSAT表,但我不确定。
但是仍然不知道0xEB
。
答案 0 :(得分:0)
如果在Windows上运行,或者对于此任务可以使用VM,则可能需要使用COM接口来执行此操作 - 您甚至可以使用pywin32在Python中使用它。例如,请查看此问题:Export Charts from Excel as images using Python。