Question

我正在尝试使用Apache Tika，Spring和Thymeleaf按原样将MSWord文档检索为HTML / XHTML格式，但是我无法返回图像，表格等元素。

我遵循了http://tika.apache.org/1.20/examples.html上的文档示例指南。

    public String getTikaTest() throws Exception {
    ContentHandler handler = new ToXMLContentHandler();

    AutoDetectParser parser = new AutoDetectParser();
    Metadata metadata = new Metadata();
    try (InputStream stream = new FileInputStream("/home/folder1/test.docx")) {
        parser.parse(stream, handler, metadata);
        return handler.toString();
    }
}

...

    @GetMapping({ "/document" })
    public ModelAndView test() {
    modelAndView.addObject("test", testService.getTikaTest());

    return modelAndView;
    }

...

    <div th:fragment="document">
    <div th:utext="${test}"></div>
    </div>

纯文本是可以的，但是我无法将元素作为图像检索到网页上，我可以在处理程序对象中看到它们，就像嵌入的标签：“ image1.png”一样，但是我不知道该如何制作它在视图级别工作。预先感谢。

Java Spring / Apache Tika / Thymeleaf-从MSWord到XHTML的文本和图像

0 个答案: