In a XML file parsed to a Document I want to get a XML attribute that has embedded tabs and new lines.
I've googled and found that the XML parsing spec says the attribute text is "normalized", replacing white space characters with a blank.
I guess a have to replace the tabs and line breaks with an appropriate escaped character before I parse the XML.
In all of my googling I have not found a straightforward method to get from the File to a Document where the attribute text is returned with Tabs and Line breaks preserved.
The XML file is generated from a third party application so it may not be addressed there.
I want to use the JDK parser.
My initial attempts at reading the File into a string and parsing the String fail with a parse error on the first byte
Any suggestions on a straight forward approach?
An example element is at pastbin Element example
[1]: https://pastebin.com/pc9uGbSD
I perform a XML Parse like this
public ReadPlexExport(Path xmlPath, ExportType exType) throws Exception {
this.xmlPath = xmlPath;
this.type = exType;
this.doc = DBF.newDocumentBuilder().parse(this.xmlPath.toFile());
}
答案 0 :(得分:0)
我当前问题的快速而肮脏的解决方案是逐行读取XML文件作为文本文件,在每行上用转义的选项卡值替换\ t字符,将行写入新文件,然后附加一个逃跑换线。
可以解析新的XML文件。原始XML将始终采用允许此hack为\ t的形式,并且只有在属性中才会出现换行符。