使用ow3c.dom.Document对象解析文档时解析错误,在文档的元素内容中找到(Unicode:0x1a)

时间:2014-04-16 05:44:52

标签: java xml

我收到错误:org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 14515; An invalid XML character (Unicode: 0x1a) was found in the element content of the document

我收到错误的xml文件内容:

 <Product>
          <Description>672577000 3M 4540 DISPOSABLE COVERALL → XL</Description>
 </Product>

我在使用org.w3c.dom.Document对象解析文档时遇到此错误,由于输入文件中的→导致错误。那么我该如何解决这个问题?

2 个答案:

答案 0 :(得分:0)

xml文件中不允许所有字符。这是一个链接,供您查找允许或不鼓励哪一个,并且不允许重置:

http://en.wikipedia.org/wiki/Valid_characters_in_XML

不允许你(→)。

答案 1 :(得分:0)

I resolved this by using below code
String removedUnicodeChar  = "DISPOSABLE COVERALL → XXL</Description></Order> ↔ ↕ ↑ ↓ → ABC";
Pattern pattern = Pattern.compile("[\\p{Cntrl}|\\uFFFD]");
Matcher m = pattern.matcher(removedUnicodeChar);
if(m.find()){
    System.out.println("Control Characters found");
    removedUnicodeChar = m.replaceAll("");
}