从Internet解析XML文档

时间:2013-10-10 05:27:17

标签: xml

我是XML Parsing的新手。我正在尝试访问“I Heart Quotes”API。这是产生错误的代码:

String link = "http://www.iheartquotes.com/api/v1/random.xml";
URL url = new URL(link);
InputStream is = url.openStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);`

这就是错误:

Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed       in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:256)           at   com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:345)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at com.nicolasekhoury.IHQuotes.IHQuotes.main(IHQuotes.java:28)

我该怎么办?

2 个答案:

答案 0 :(得分:1)

当我在浏览器中打开http://www.iheartquotes.com/api/v1/random.xml时,它们是转义符号,我认为它根本不是xml - 它只是一个自由格式文本。

答案 1 :(得分:0)

访问mentioned resource会得到类似于此的输出:

You are fairminded, just and loving.

[fortune] http://iheartquotes.com/fortune/show/46886

这不是XML,因为它不是well-formed

我认为你应该做的事情取决于。如果这仅仅是为了学习,那么找一个真正的XML源(例如your Stack Overflow user feed)并随意使用它。如果您需要使用这个数据源,那么请寻找除XML之外的其他内容。

我刚刚发现他们提供的HTML不是XML,但在某些情况下可以使用XML解析器。阅读their docs并尝试访问http://www.iheartquotes.com/api/v1/random?format=html,这将为您提供类似于此的输出:

<html>
<head>
<title>I Heart Quotes - Random Quote Widget</title>
<style type="text/css">/* ... */</style>
</head>
<body>
<table>
<tr>
<td>
<div class="rbroundbox">
    <div class="rbtop"><div></div></div>
            <div class="rbcontent">
<a target="_parent" 
   href='http://www.iheartquotes.com/fortune/show/halleys_comet_it_came_we_saw_we_drank'>
Halley's Comet: It came, we saw, we drank.
</a>
<div class="source">
<a target="_parent" 
   href="http://www.iheartquotes.com/fortune/rand?source=codehappy">[codehappy quote]</a>
</div>
</div><!-- /rbcontent -->
    <div class="rbbot"><div></div></div>
    </div><!-- /rbroundbox -->
</td></tr></table>
</body>
</html>