我有以下代码来发送HTTP请求,接收响应(以XML的形式)并解析它:
public Document getDocumentElementFromDatabase() {
// this URL is actually built dynamically from a query, but for this example I just use one of the possible resulting URLs
String url = "http://musicbrainz.org/ws/2/recording?query=%22Thunderstruck%22+AND+artistname%3A%222Cellos%22";
try {
// sleep between successive requests to avoid flooding the server
Thread.sleep(1000);
HttpURLConnection connection = runQuery(url);
InputStream stream = connection.getInputStream();
if (stream != null) {
BufferedInputStream buff = new BufferedInputStream(stream);
return DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(buff);
}
}
// I've grouped exception handling for this example
catch (ParserConfigurationException | InterruptedException | SAXException | IOException e) {
e.printStackTrace();
}
finally {
if (connection != null) connection.disconnect();
}
return null;
}
private void runQuery(String url) throws MalformedURLException, IOException {
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestProperty("User-Agent", "MyAppName/1.0 ( myemail@email.email )");
return connection;
}
此代码被多次调用,有时我会收到以下错误:
[致命错误]:1:1:prolog中不允许内容。
org.xml.sax.SAXParseException; lineNumber:1; columnNumber:1; prolog中不允许使用内容。
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
...
如果我尝试访问Chrome中的网址,无论我重新加载多少次,我都会每次都获得有效的XML响应。更重要的是,当我在笔记本电脑上运行完全相同的代码时,似乎没有出现同样的问题。
经过一些修修补补后,我尝试将InputStream
s直接打印为字符串(使用this link中的方法4),而不是解析它们,我注意到有时响应实际上没有拥有预期的XML标头(<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
),但有时也是如此。
我的猜测是我对流做错了,但我无法弄清楚是什么。
答案 0 :(得分:0)
我发现了这个问题。该网站似乎有时会返回一个JSON响应而不是XML,这导致解析器吓坏了。我已将以下行添加到runQuery
:
connection.setRequestProperty("Accept", "application/xml");
我现在可以成功运行代码而不会出错。