我有一个XSLT,它将html表转换为CSV,其定义如下
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="//tr">
<xsl:for-each select="td">
<xsl:if test="position() > 1">,</xsl:if>
<xsl:value-of select="."/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
但我现在遇到的问题是表格的标签是用ascii代码编写的。
示例输入:
<table><tr>
<th>Order ID</th>
<th>Item ID</th>
<th>Participant ID</th>
<th>Status</th>
<th>Shipping Provider</th>
<th>Tracking Number</th>
<th>Shipped Date</th>
<th>Shipping Method</th></tr>
<tr>
<td align="center"> Choice_DJ4</td>
<td align="center"> 4</td>
<td align="center"> DXM09902</td>
<td align="center"> Shipped</td>
<td align="center"> USPS</td>
<td align="center"> </td>
<td align="center"> 04/13/2017</td>
<td align="center"> Standard Ground</td>
</tr>
</table>
我的问题是,有没有办法让xsl文件将ascii代码识别为其预期的字符。 更新: 这是我的java代码
String data = readFile("config/email.xml");
System.out.println("Data: \n" + data);
InputSource is = new InputSource(new StringReader(data));
String configFile = "config/email-xslt.xsl";
File stylesheet = new File(configFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(stylesource);
Source source = new DOMSource(document);
StringWriter sw = new StringWriter();
Result outputTarget = new StreamResult(sw);
transformer.transform(source, outputTarget);
data = sw.toString();
System.out.println("Output: " + data);
答案 0 :(得分:0)
使用XSLT 3.0,您可以使用unparsed-text()
加载文本,parse-xml-fragment()
来取消对实体的访问,使用parse-xml()
来解析XML字符串。
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="text" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<!--first, load the contents of the document (adjust path to your document) -->
<xsl:variable name="input" select="unparsed-text('table.txt')" as="item()"/>
<!--second, unescape the angle bracket entities -->
<xsl:variable name="table-text" select="parse-xml-fragment($input)" as="item()"/>
<!--third, parse the serialized XML string -->
<xsl:variable name="table" select="parse-xml($table-text)" as="item()"/>
<xsl:for-each select="$table//tr">
<!--a more simplified way of generating the CSV for each row -->
<xsl:value-of select="td" separator=","/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
答案 1 :(得分:0)
能够最终解决问题...... Uisng org.apache.commons.lang3.StringEscapeUtils.unescapeJava(str);
我的xsl文件和数据输入(config / email.xml)仍然是来自OP的,但我必须修改java代码以在传递给xsl翻译器之前对这些字符进行unescape。
String data = readFile("config/email.xml");
data = StringEscapeUtils.unescapeXml(data);
System.out.println("Data: \n" + data);
InputSource is = new InputSource(new StringReader(data));
String configFile = "config/email-xslt.xsl";
File stylesheet = new File(configFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(is);
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(stylesource);
Source source = new DOMSource(document);
StringWriter sw = new StringWriter();
Result outputTarget = new StreamResult(sw);
transformer.transform(source, outputTarget);
data = sw.toString();
System.out.println("Output: " + data);