使用Java格式将rtf转换为html

时间:2014-03-13 18:01:10

标签: java html rtf jeditorpane strikethrough

我可以使用JEditorPane来解析rtf文本并将其转换为html。但是html输出缺少某种格式,即本例中的删除标记。正如您在输出中看到的,下划线文本已正确包装在< u>中。但没有透视包装。有什么想法吗?

public void testRtfToHtml()
{
    JEditorPane pane = new JEditorPane();
    pane.setContentType("text/rtf");

    StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf");

    try
    {
        kitRtf.read(
            new StringReader(
                "{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26  } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"),
            pane.getDocument(), 0);
        kitRtf = null;

        StyledEditorKit kitHtml =
            (StyledEditorKit) pane.getEditorKitForContentType("text/html");

        Writer writer = new StringWriter();
        kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength());
        System.out.println(writer.toString());
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}

输出:

<html>
  <head>
    <style>
      <!--
        p.Normal {
          RightIndent:0.0;
          FirstLineIndent:0.0;
          LeftIndent:0.0;
        }
      -->
    </style>
  </head>
  <body>
    <p class=default>
              <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
This is supposed to be strike-through.
      </span>
      <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">

      </span>
       <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
<u>Underline text here</u>
      </span>
       <span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
.?
      </span>

    </p>
  </body>
</html>

2 个答案:

答案 0 :(得分:2)

您可以尝试按照this converter library

所述in this blog post使用OpenOffice或LibreOffice进行转换

答案 1 :(得分:0)

这是我用来将RTF从.msg正文转换为HTML的函数。 请参阅我在GitHub上的Outlook消息解析器yamp存储库。

public static String rtfToHtml(String rtfText) {
    if (rtfText != null) {
        rtfText = rtfText.replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*(.*)}", "$1")
            .replaceAll("\\\\htmlrtf[1]?(.*)\\\\htmlrtf0", "")
            .replaceAll("\\\\htmlrtf[01]?", "")
            .replaceAll("\\\\htmlbase", "")
            .replaceAll("\\\\par", "\n")
            .replaceAll("\\\\tab", "\t")
            .replaceAll("\\\\line", "\n")
            .replaceAll("\\\\page", "\n\n")
            .replaceAll("\\\\sect", "\n\n")
            .replaceAll("\\\\emdash", "&#2014;")
            .replaceAll("\\\\endash", "&#2013;")
            .replaceAll("\\\\emspace", "&#2003;")
            .replaceAll("\\\\enspace", "&#2002;")
            .replaceAll("\\\\qmspace", "&#2005;")
            .replaceAll("\\\\bullet", "&#2022;")
            .replaceAll("\\\\lquote", "&#2018;")
            .replaceAll("\\\\rquote", "&#2019;")
            .replaceAll("\\\\ldblquote", "&#201C;")
            .replaceAll("\\\\rdblquote", "&#201D;")
            .replaceAll("\\\\row", "\n")
            .replaceAll("\\\\cell", "|")
            .replaceAll("\\\\nestcell", "|")
            .replaceAll("([^\\\\])\\{", "$1")
            .replaceAll("([^\\\\])}", "$1")
            .replaceAll("[\\\\](\\{)", "$1")
            .replaceAll("[\\\\](})", "$1")
            .replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
            .replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
            .replaceAll("\"cid:(.*)@.*\"", "\"$1\"");

        int index = rtfText.indexOf("<html");
        if (index != -1) {
            return rtfText.substring(index);
        }
    }

    return null;
}