如何使用python将.docx文件转换为html?

时间:2017-10-30 05:27:30

标签: python-2.7

import mammoth

f = open("D:\filename.docx", 'rb')
document = mammoth.convert_to_html(f)

我运行此代码时无法获取.html文件,请帮助我获取它,当我转换为.html文件时,我没有将图像插入到.html文件的word文件中,你能帮忙吗?我如何从.docx获取图像到.html?

3 个答案:

答案 0 :(得分:2)

试试这个:

import mammoth

f = open("path_to_file.docx", 'rb')
b = open('filename.html', 'wb')
document = mammoth.convert_to_html(f)
b.write(document.value.encode('utf8'))
f.close()
b.close()

答案 1 :(得分:1)

我建议您尝试以下代码

    import mammoth
    with open("document.docx", "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file)
    html = result.value

答案 2 :(得分:0)

回答这个问题可能为时已晚,但以防万一如果有人仍在寻找在转换为下面的 html 后单词“tables/images/”应该保持不变的答案会有所帮助。

import win32com.client as win32
# Open MS Word
word = win32.gencache.EnsureDispatch('Word.Application')

doc = word.Documents.Open("D:\filename.docx")
# change to a .html
txt_path = word_file.split('.')[0] + '.html'

# wdFormatFilteredHTML has value 10
# saves the doc as an html
doc.SaveAs(txt_path, 10)

doc.Close()
# noinspection PyBroadException
try:
    word.ActiveDocument()
except Exception:
    word.Quit()