如何替换XML字符串中的无效字符?

时间:2012-08-03 14:11:23

标签: java xml-parsing

我有一个由UTF-16编码的字符串。使用javax.xml.parsers.DocumentBuilder进行解析时,出现了如下错误:

Character reference "&#x0" is an invalid XML character

以下是我用来解析XML的代码:

InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xmlString));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
org.w3c.dom.Document document = parser.parse(inputSource);

我的问题是,如何用(空格)替换无效字符?

3 个答案:

答案 0 :(得分:1)

您只需要使用String.replaceAll并传递无效字符的模式。

答案 1 :(得分:0)

您正在尝试解析无效的xml entity,这就是提出异常的原因。您似乎无需为您的情况担心UTF-16

找到一些解释和示例here

例如,&无法使用valid xml字符,我们需要使用&。这里&是xml实体。

假设上面的例子应该是自解释的,以了解xml实体是什么。

据我所知,有些xml实体无效。但再也不用担心了。可以声明&添加新的xml entity。请查看上面的文章以获取更多详细信息。


编辑:假设有&个字符使xml无效。

答案 2 :(得分:0)

StringEscapeUtils()

将escapeXml

public static void escapeXml(java.io.Writer writer,
                             java.lang.String str)
                      throws java.io.IOException

Escapes the characters in a String using XML entities.

For example: "bread" & "butter" => "bread" & "butter".

Supports only the five basic XML entities (gt, lt, quot, amp, apos). 
Does not support DTDs or external entities.

Note that unicode characters greater than 0x7f are currently escaped to their 
numerical \\u equivalent. This may change in future releases.

Parameters:
    writer - the writer receiving the unescaped string, not null
    str - the String to escape, may be null 
Throws:
    java.lang.IllegalArgumentException - if the writer is null 
    java.io.IOException - if there is a problem writing
See Also:
    unescapeXml(java.lang.String)