Java用非法XML字符解组xml

时间:2017-12-25 12:01:07

标签: java xml jaxb jaxb2

我正在尝试使用javax.xml.bind.Unmarshaller XML字符串解组,但收到以下错误:

Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x13) was found in the element content of the document.

是否有任何通用解决方案可以从输入字符串中删除所有非法XML字符?

例如,我尝试使用以下一个,但它没有帮助:

public static String illegalXML11CharactersPattern = "[^"
        + "\u0001-\uD7FF"
        + "\uE000-\uFFFD"
        + "\ud800\udc00-\udbff\udfff"
        + "]+";

public static String stripNonValidXML11Characters(String xml) {
    return xml.replaceAll(illegalXML11CharactersPattern, "");
}

1 个答案:

答案 0 :(得分:0)

最后,我完成了以下方法:

xml = org.apache.commons.lang3.StringEscapeUtils.unescapeXml(StringEscapeUtils.escapeXml10(xml));