在xml元素的值中获取HTML或XHTML文本

时间:2018-10-12 18:10:15

标签: java xslt xml-parsing jsoup jdom

我的xml文档中有projectDetails节点,该节点是通过java jdom api创建的,而节点内部的数据来自数据库。
问题是描述字段以html的形式存储在数据库中。当我将其添加到<descriptionDetails />元素中,并使用Java的transform类对其进行转换时,它将转义所有html标记。
是否有可能像其余标签一样获取HTML代码作为descriptionDetails的子代并进行转义。

  <projectDetails label="label.projectDetails">
    <descriptionDetails label="label.descriptionDetails">
    &lt;html&gt;
 &lt;head&gt;&lt;/head&gt;
 &lt;body&gt;
  &lt;strong&gt;&lt;strong&gt; Tiny MCE Bold&lt;br /&gt;&lt;em&gt;Tiny MCE Bold/Itellic&lt;/em&gt;&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt;&lt;em&gt;Tiny MCE Bold/Itellic/Underlined&lt;/em&gt;&lt;/span&gt;&lt;br /&gt;&lt;/strong&gt;&lt;/strong&gt; 
  &lt;div&gt;
   Lorem Ipsum&amp;nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown 
   &lt;br /&gt;
   &lt;br /&gt;
   &lt;span style="color: #ff0000;"&gt;printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset &lt;span style="color: #ffffff; background-color: #808000;"&gt;&lt;span style="background-color: #808000;"&gt;sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum,.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;
  &lt;/div&gt; 
  &lt;h1&gt;H1 heading&lt;/h1&gt; 
  &lt;h2&gt;H1 heading&lt;/h2&gt; 
  &lt;h3&gt;H1 heading&lt;/h3&gt; 
  &lt;h4&gt;H1 heading&lt;/h4&gt; 
  &lt;h5&gt;H1 heading&lt;/h5&gt; 
  &lt;h6&gt;H1 heading&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: 14pt;"&gt;font size 14&lt;/span&gt;&lt;/h6&gt;
 &lt;/body&gt;
&lt;/html&gt;
</descriptionDetails>
 </projectDetails 

private static String xmlAsString(Document xml) throws Exception {
        Transformer tf = TransformerFactory.newInstance().newTransformer();

        tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        tf.setOutputProperty(OutputKeys.INDENT, "yes");
        tf.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

        Writer out = new StringWriter();
        tf.transform(new DOMSource(xml), new StreamResult(out));
        return out.toString();
    }

预期的输出

<projectDetails label="label.projectDetails">
    <descriptionDetails label="label.descriptionDetails">
    <html>
 <head></head>
 <body>
  <strong><strong> Tiny MCE Bold<br /><em>Tiny MCE Bold/Itellic</em><br /><span style="text-decoration: underline;"><em>Tiny MCE Bold/Itellic/Underlined</em></span><br /></strong></strong> 
  <div>
   Lorem Ipsum&nbsp;is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown 
   <br />
   <br />
   <span style="color: #ff0000;">printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset <span style="color: #ffffff; background-color: #808000;"><span style="background-color: #808000;">sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum,.</span><br /></span></span>
  </div> 
  <h1>H1 heading</h1> 
  <h2>H1 heading</h2> 
  <h3>H1 heading</h3> 
  <h4>H1 heading</h4> 
  <h5>H1 heading</h5> 
  <h6>H1 heading<br /><br /><span style="font-size: 14pt;">font size 14</span></h6>
 </body>
</html>
</descriptionDetails>
 </projectDetails

1 个答案:

答案 0 :(得分:1)

您可以使用https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/TransformerFactory.html#newTransformer-javax.xml.transform.Source-从样式表中创建一个,而不用newInstance().newTransformer()使用默认的Transformer

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="descriptionDetails/text()">
      <xsl:value-of select="." disable-output-escaping="yes"/>
  </xsl:template>

</xsl:stylesheet>

用作来源。

https://xsltfiddle.liberty-development.net/nc4NzR7

但是请注意,您转义的HTML包含例如实体引用&nbsp;,该实体引用随后会将您的输出转换为格式不正确的XML,因为该实体未在XML中预定义。