我如何找到文档文件中存在的图像图像,python中是否有任何模块。我搜索但没有用。 这是我们如何从word文件中读取的。下面的代码不提供有关文件
中存在的图像的信息 from docx import Document
documnet=Document('new-file-name.docx')
para=documnet.paragraphs
for par in para:
print par.text
答案 0 :(得分:6)
您必须首先将所有图像文件提取为.zip,在XML代码中查找图像元素,并将每个图像与其rId相关联。
import os
import docx
import docx2txt
# Extract the images to img_folder/
docx2txt.process('document.docx', 'img_folder/')
# Open you .docx document
doc = docx.Document('document.docx')
# Save all 'rId:filenames' relationships in an dictionary named rels
rels = {}
for r in doc.part.rels.values():
if isinstance(r._target, docx.parts.image.ImagePart):
rels[r.rId] = os.path.basename(r._target.partname)
# Then process your text
for paragraph in doc.paragraphs:
# If you find an image
if 'Graphic' in paragraph._p.xml:
# Get the rId of the image
for rId in rels:
if rId in paragraph._p.xml:
# Your image will be in os.path.join(img_path, rels[rId])
else:
# It's not an image
GitHub存储库链接:django-docx-import
答案 1 :(得分:5)
由于.docx
文件是zip文件,因此您可以使用zipfile模块:
import zipfile
z = zipfile.ZipFile("1.docx")
#print list of valid attributes for ZipFile object
print dir(z)
#print all files in zip archive
all_files = z.namelist()
print all_files
#get all files in word/media/ directory
images = filter(lambda x: x.startswith('word/media/'), all_files)
print images
#open an image and save it
image1 = z.open('word/media/image1.jpeg').read()
f = open('image1.jpeg','wb')
f.write(image1)
#Extract file
z.extract('word/media/image1.jpeg', r'path_to_dir')