致命错误:字符引用“&#org.xml.sax.SAXParseException;

时间:2019-03-25 10:17:34

标签: java xml sax saxparser

 <BATCHNAME>&#4; Any</BATCHNAME> 
我的xml请求中的

标记值具有''个字符。没有这些字符,我的代码可以完美地工作,但是在某些情况下,我拥有这些字符。它给了我以下错误

  

[致命错误]:144:28:字符引用“&#       org.xml.sax.SAXParseException; lineNumber:144; columnNumber:28;字符       参考“&#,com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)     在   com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)     在javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)     在d.b(AllCommonTasks.java:277)在...

我需要这些字符进行验证

我正在尝试此代码=>

try {                      

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();

        URLConnection urlConnection = new URL(urlString).openConnection();
        urlConnection.addRequestProperty("Accept", "application/xml");
        urlConnection.addRequestProperty("User-Agent", "Mozilla/5.0 ( compatible ) ");
        Document doc = db.parse(urlConnection.getInputStream());
        doc.getDocumentElement().normalize();

        str = convertDocumentToString(doc);


    }catch(Exception e){
        System.err.println("In exception 1");
        e.printStackTrace();
    }

我该如何解决?

1 个答案:

答案 0 :(得分:0)

看看Wikipedia page for XML and HTML entity references,遵循&#nnnn;模式的实体引用是十进制形式的Unicode代码点,这意味着&#4;等同于Unicode U+0004END OF TRANSMISSION,这是非打印字符。

因此,我认为解析器在这种情况下会失败是正确的。

实际上,如果您查看com.sun.org.apache.xerces.internal.impl.XMLScanner#scanCharReferenceValue的来源,则可以在此处看到它引用了com.sun.org.apache.xerces.internal.util.XMLChar#isValid

/**
 * Returns true if the specified character is valid. This method
 * also checks the surrogate character range from 0x10000 to 0x10FFFF.
 * <p>
 * If the program chooses to apply the mask directly to the
 * <code>CHARS</code> array, then they are responsible for checking
 * the surrogate character range.
 *
 * @param c The character to check.
 */
public static boolean isValid(int c) {
    return (c < 0x10000 && (CHARS[c] & MASK_VALID) != 0) ||
           (0x10000 <= c && c <= 0x10FFFF);
} // isValid(int):boolean