使XSLT转换文件识别ascii字符

时间:2017-04-13 18:41:39

标签: java xml xslt

我有一个XSLT,它将html表转换为CSV,其定义如下

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:fo="http://www.w3.org/1999/XSL/Format" >
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="/">
         <xsl:for-each select="//tr">
            <xsl:for-each select="td">
                <xsl:if test="position() > 1">,</xsl:if>
                <xsl:value-of select="."/>
            </xsl:for-each>
         <xsl:text>&#xA;</xsl:text>
    </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

但我现在遇到的问题是表格的标签是用ascii代码编写的。

示例输入:

&lt;table&gt;&lt;tr&gt;
        &lt;th&gt;Order ID&lt;/th&gt;
        &lt;th&gt;Item ID&lt;/th&gt;
        &lt;th&gt;Participant ID&lt;/th&gt;
        &lt;th&gt;Status&lt;/th&gt;
        &lt;th&gt;Shipping Provider&lt;/th&gt;
        &lt;th&gt;Tracking Number&lt;/th&gt;
        &lt;th&gt;Shipped Date&lt;/th&gt;
        &lt;th&gt;Shipping Method&lt;/th&gt;&lt;/tr&gt;
            &lt;tr&gt;
            &lt;td align="center"&gt; Choice_DJ4&lt;/td&gt;
            &lt;td align="center"&gt; 4&lt;/td&gt;
            &lt;td align="center"&gt; DXM09902&lt;/td&gt;
            &lt;td align="center"&gt; Shipped&lt;/td&gt; 
            &lt;td align="center"&gt; USPS&lt;/td&gt; 
            &lt;td align="center"&gt; &lt;/td&gt; 
            &lt;td align="center"&gt; 04/13/2017&lt;/td&gt; 
            &lt;td align="center"&gt; Standard Ground&lt;/td&gt; 
            &lt;/tr&gt;
    &lt;/table&gt;

我的问题是,有没有办法让xsl文件将ascii代码识别为其预期的字符。 更新:         这是我的java代码

String data = readFile("config/email.xml");

    System.out.println("Data: \n" + data);
    InputSource is = new InputSource(new StringReader(data));

    String configFile = "config/email-xslt.xsl";

    File stylesheet = new File(configFile);

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(is);

    StreamSource stylesource = new StreamSource(stylesheet);
    Transformer transformer = TransformerFactory.newInstance()
            .newTransformer(stylesource);
    Source source = new DOMSource(document);
    StringWriter sw = new StringWriter();
    Result outputTarget = new StreamResult(sw);

    transformer.transform(source, outputTarget);
    data = sw.toString();
    System.out.println("Output: " + data);

2 个答案:

答案 0 :(得分:0)

使用XSLT 3.0,您可以使用unparsed-text()加载文本,parse-xml-fragment()来取消对实体的访问,使用parse-xml()来解析XML字符串。

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0">
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
    <xsl:template match="/">
        <!--first, load the contents of the document (adjust path to your document) -->
        <xsl:variable name="input" select="unparsed-text('table.txt')" as="item()"/>
        <!--second, unescape the angle bracket entities -->
        <xsl:variable name="table-text" select="parse-xml-fragment($input)" as="item()"/>
        <!--third, parse the serialized XML string -->
        <xsl:variable name="table" select="parse-xml($table-text)" as="item()"/>
        <xsl:for-each select="$table//tr">
            <!--a more simplified way of generating the CSV for each row -->
            <xsl:value-of select="td" separator=","/>
            <xsl:text>&#xA;</xsl:text>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

答案 1 :(得分:0)

能够最终解决问题...... Uisng org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);

我的xsl文件和数据输入(config / email.xml)仍然是来自OP的,但我必须修改java代码以在传递给xsl翻译器之前对这些字符进行unescape。

String data = readFile("config/email.xml");
data = StringEscapeUtils.unescapeXml(data);
System.out.println("Data: \n" + data);
InputSource is = new InputSource(new StringReader(data));

String configFile = "config/email-xslt.xsl";

File stylesheet = new File(configFile);

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);

StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
     .newTransformer(stylesource);
Source source = new DOMSource(document);
StringWriter sw = new StringWriter();
Result outputTarget = new StreamResult(sw);

transformer.transform(source, outputTarget);
data = sw.toString();
System.out.println("Output: " + data);