RSS Feed - 解析结束标记异常时发生

时间:2016-10-19 11:52:18

标签: java xml rss feed rome

我正在使用rome-1.5.jar来解析RSS Feed。但是当它解析一些rss feed时会出现关闭元标记的错误。

RSS Feed链接:NewYork Times RSS Feed Link

这是代码

 public static SyndFeed getRssFeed(String rsslUrl){
      try {
          URL url = new URL(rsslUrl);
          HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
          httpcon.addRequestProperty("User-Agent", "Mozilla/4.76");
          SyndFeedInput input = new SyndFeedInput();
          return input.build(new XmlReader(httpcon.getInputStream()));
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
  }

这是例外

com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 45: The element type "meta" must be terminated by the matching end-tag "</meta>".
    at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:215)
    at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:133)
    at com.gold.eloop.server.util.RssUtil.getRssFeed(RssUtil.java:132)
    at com.gold.eloop.server.util.RssUtil.getRssForProfile(RssUtil.java:228)
    at com.gold.eloop.server.util.RssUtil.mergeRssProfiles(RssUtil.java:269)
    at com.gold.eloop.server.util.outbound.MailMerger.getTransmission(MailMerger.java:581)
    at com.gold.eloop.server.services.MessageServiceImpl.sendTestMessage(MessageServiceImpl.java:192)
    at com.gold.eloop.server.remoteservices.MessageServiceRemote.sendTestMessage(MessageServiceRemote.java:309)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at com.google.gwt.user.server.rpc.RPC.invokeAndEncodeResponse(RPC.java:562)
    at com.google.gwt.user.server.rpc.RemoteServiceServlet.processCall(RemoteServiceServlet.java:188)
    at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:224)
    at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:324)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)
Caused by: org.jdom2.input.JDOMParseException: Error on line 45: The element type "meta" must be terminated by the matching end-tag "</meta>".
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
    at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
    at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:212)
    ... 34 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 45; columnNumber: 9; The element type "meta" must be terminated by the matching end-tag "</meta>".
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217)
    ... 37 more

我在这段代码中做错了吗?请帮我解决这个错误。

1 个答案:

答案 0 :(得分:2)

指定的网址http://www.nytimes.com/services/xml/rss/index.html不会返回RSS文档。

有以下内容:

<meta name="PT" content="Member Center">
<meta name="PST" content="RSS Page">

RSS处理器将失败。

该页面是RSS源列表,而不是RSS源本身。

第一个链接是http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml:尝试将其传递给RSS处理器。