有
<BATCHNAME> Any</BATCHNAME>
我的xml请求中的标记值具有''个字符。没有这些字符,我的代码可以完美地工作,但是在某些情况下,我拥有这些字符。它给了我以下错误
[致命错误]:144:28:字符引用“&# org.xml.sax.SAXParseException; lineNumber:144; columnNumber:28;字符 参考“&#,com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) 在 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339) 在javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) 在d.b(AllCommonTasks.java:277)在...
我需要这些字符进行验证
我正在尝试此代码=>
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
URLConnection urlConnection = new URL(urlString).openConnection();
urlConnection.addRequestProperty("Accept", "application/xml");
urlConnection.addRequestProperty("User-Agent", "Mozilla/5.0 ( compatible ) ");
Document doc = db.parse(urlConnection.getInputStream());
doc.getDocumentElement().normalize();
str = convertDocumentToString(doc);
}catch(Exception e){
System.err.println("In exception 1");
e.printStackTrace();
}
我该如何解决?
答案 0 :(得分:0)
看看Wikipedia page for XML and HTML entity references,遵循&#nnnn;
模式的实体引用是十进制形式的Unicode代码点,这意味着
等同于Unicode U+0004: END OF TRANSMISSION
,这是非打印字符。
因此,我认为解析器在这种情况下会失败是正确的。
实际上,如果您查看com.sun.org.apache.xerces.internal.impl.XMLScanner#scanCharReferenceValue
的来源,则可以在此处看到它引用了com.sun.org.apache.xerces.internal.util.XMLChar#isValid
:
/**
* Returns true if the specified character is valid. This method
* also checks the surrogate character range from 0x10000 to 0x10FFFF.
* <p>
* If the program chooses to apply the mask directly to the
* <code>CHARS</code> array, then they are responsible for checking
* the surrogate character range.
*
* @param c The character to check.
*/
public static boolean isValid(int c) {
return (c < 0x10000 && (CHARS[c] & MASK_VALID) != 0) ||
(0x10000 <= c && c <= 0x10FFFF);
} // isValid(int):boolean