如何在Python中将dotx文件转换为html文件?

时间:2019-05-25 04:59:58

标签: python python-3.x pypandoc

这是我的代码,目前失败

import os
import pypandoc
source_dir = 'source'
result_dir = 'result'

for file in os.listdir(source_dir):

    output_files1 = []
    source_file = source_dir + '/'+file
    output_file = result_dir + '/'+file.replace('.dotx','.html').replace('.ott','.html')
    output = pypandoc.convert_file(source_file, 'html', outputfile=output_file)

我正在尝试将dotx文件转换为html文件,但是出现以下错误:

RuntimeError: Invalid input format! Got "dotx" but expected one of
these: commonmark, creole, docbook, docx, epub, fb2, gfm, haddock, 
html, jats, json, latex, markdown, markdown_github, markdown_mmd,
markdown_phpextra, markdown_strict, mediawiki, muse, native, odt, opml,
org, rst, t2t, textile, tikiwiki, twiki, vimwiki

1 个答案:

答案 0 :(得分:1)

尽管Pandoc支持.docx,但遗憾的是Pandoc当前在list of supported formats中不支持.dotx文件

幸运的是,由于.docx.dotx几乎相同,因此您只需将文件扩展名更改为.docx,Pandoc就可以支持它。有关更多上下文,请参见此问题:https://superuser.com/questions/1285415/difference-between-documents-with-docx-and-dotx-filename-extensions

在您现有的循环中添加了一些逻辑,以帮助将任何.dotx重命名为.docx文件:

import os
import pypandoc
source_dir = 'source'
result_dir = 'result'

for file in os.listdir(source_dir):
    if file.endswith('.dotx'):
        filename = os.path.splitext(file)[0]
        os.rename(file, filename + '.docx')
        file = filename + '.dotx'
    output_files1 = []
    source_file = source_dir + '/'+file
    output_file = result_dir + '/'+file.replace('.dotx','.html').replace('.ott','.html')
    output = pypandoc.convert_file(source_file, 'html', outputfile=output_file)

希望这会有所帮助!如有任何问题,请告诉我。