Question

我有一个来自网络API的XML文件输入。当我尝试将其保存为浏览器中的XML文件时，它中有一些多余的。问题是在尝试通过StaX解析这个XML数据时，在处理之后，执行一些任务写回另一个XML格式作为DOM，它改为。

我想要做的就是避免来自输入的多余和来自输出的。无法找到这两种解决方案背后的原因。

这是保存到文件后输入 XML元素值中的内容，

Today is a fine day.&#xD;
&#xD;
So does everyday.

写完后，输出

Today is a fine day.&#13;
&#13;
So does everyday.

实际上预期和必需的输出

<someNode>Today is a fine day.

So does everyday.
</someNode>

节点的Text值中的新行是有意的，需要保留原样。

简化代码示例：

来自API的

阅读流：

// Get Input XML stream from API
URL apiURL = new URL(API_Url);
HttpsURLConnection httpsAPIURLConn;
httpsAPIURLConn = (HttpsURLConnection) apiURL.openConnection();
httpsAPIURLConn.setConnectTimeout(10000); // timeout
httpsAPIURLConn.setDoInput(true);
InputStream inStream = httpsAPIURLConn.getInputStream();

// Data stream okay, Start StaX XLIFF reader
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
// This is to read entity referenced strings
xmlInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);

// StaX StreamReader
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new BufferedInputStream(inStream), "UTF-8");

// Read and load XML data to in-memory database to filter and process

过滤和处理原始数据后

编写新的XML结构文件

// After processing and writing new Element structure to org.w3c.dom.Document
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer tr = transformerFactory.newTransformer();
tr.setOutputProperty(OutputKeys.INDENT, "yes");
tr.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
tr.setOutputProperty(OutputKeys.METHOD, "xml");
tr.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tr.setOutputProperty(OutputKeys.STANDALONE, "no");

DOMSource source = new DOMSource(doc);
File file = new File(xmlFilePath);
Writer outputStream = new OutputStreamWriter(new FileOutputStream(file), "UTF-8");
StreamResult result = new StreamResult(outputStream);
tr.transform(source, result);

不确定我错过了什么。但任何建议或帮助都会很棒。

Answer 1

最简单的解决方案（除了挂钩到SAX事件流之外）是编写一个XSLT脚本，它完全按照您的需要执行，并将其作为变换器而不是默认的身份变换器调用。

有关建议，请参阅http://en.wikipedia.org/wiki/Identity_transform#Using_XSLT。

然后，您需要提供自己的规则来转换文本节点，您可以通过将ASCII 13字符转换为空字符串来删除它们。有关详细信息，请参阅https://stackoverflow.com/a/5084382/53897。

#xD;和＃13;在读写XML文件时

1 个答案: