我试图在报纸的任何类别中显示新闻中的文字。我使用本报的RSS。但是,当我运行代码时,有时我会在上面得到异常消息,有时它可以正常工作。这是我的RSS解析器代码:
我使用的rss页面是:
RSSFeedParser parser = new RSSFeedParser("http://www.cumhuriyet.com.tr/rss/5");
Feed feed = parser.readFeed();
RSS解析器代码:
package main;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.XMLEvent;
public class RSSFeedParser {
static final String TITLE = "title";
static final String DESCRIPTION = "description";
static final String CHANNEL = "channel";
static final String LANGUAGE = "language";
static final String COPYRIGHT = "copyright";
static final String LINK = "link";
static final String AUTHOR = "author";
static final String ITEM = "item";
static final String PUB_DATE = "pubDate";
static final String GUID = "guid";
static final String IMG = "img";
final URL url;
public RSSFeedParser(String feedUrl) {
try {
this.url = new URL(feedUrl);
} catch (MalformedURLException e) {
throw new RuntimeException(e);
}
}
public Feed readFeed() {
Feed feed = null;
try {
boolean isFeedHeader = true;
// Set header values intial to the empty string
String description = "";
String title = "";
String link = "";
String language = "";
String copyright = "";
String author = "";
String pubDate = "";
String guid = "";
String img = "";
// First create a new XMLInputFactory
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
// Setup a new eventReader
InputStream in = read();
XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
// read the XML document
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartElement()) {
String localPart = event.asStartElement().getName()
.getLocalPart();
switch (localPart) {
case ITEM:
if (isFeedHeader) {
isFeedHeader = false;
feed = new Feed(title, link, description, language,
copyright, pubDate);
}
event = eventReader.nextEvent();
break;
case TITLE:
title = getCharacterData(event, eventReader);
break;
case DESCRIPTION:
description = getCharacterData(event, eventReader);
break;
case LINK:
link = getCharacterData(event, eventReader);
break;
case GUID:
guid = getCharacterData(event, eventReader);
break;
case LANGUAGE:
language = getCharacterData(event, eventReader);
break;
case AUTHOR:
author = getCharacterData(event, eventReader);
break;
case PUB_DATE:
pubDate = getCharacterData(event, eventReader);
break;
case COPYRIGHT:
copyright = getCharacterData(event, eventReader);
break;
}
} else if (event.isEndElement()) {
if (event.asEndElement().getName().getLocalPart() == (ITEM)) {
FeedMessage message = new FeedMessage();
message.setDescription(description);
message.setPubDate(pubDate);
message.setLink(link);
message.setTitle(title);
message.setImg(img);
feed.getMessages().add(message);
event = eventReader.nextEvent();
continue;
}
}
}
} catch (XMLStreamException e) {
throw new RuntimeException(e);
}
return feed;
}
private String getCharacterData(XMLEvent event, XMLEventReader eventReader)
throws XMLStreamException {
String result = "";
event = eventReader.nextEvent();
if (event instanceof Characters) {
result = event.asCharacters().getData();
}
return result;
}
private InputStream read() {
try {
return url.openStream();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
异常消息是:
Exception in thread "main" java.lang.RuntimeException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[5,3]
Message: The element type "meta" must be terminated by the matching end-tag "</meta>".
at main.RSSFeedParser.readFeed(RSSFeedParser.java:112)
at cumhuriyet.Dunya.cumDunya(Dunya.java:32)
at automation.ServerInteraction.main(ServerInteraction.java:83)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[5,3]
Message: The element type "meta" must be terminated by the matching end-tag "</meta>".
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown Source)
at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(Unknown Source)
at main.RSSFeedParser.readFeed(RSSFeedParser.java:58)
... 2 more
答案 0 :(得分:0)
看起来它是一个真人文件;即一个经常变化的人。其中也没有标记的迹象。
我可以想到发生了什么的两种解释:
有时文档生成或创建不正确。
- 醇>
有时您会收到HTML错误页面而不是文档 你期待的,XML解析器无法处理标签 HTML&#39>。
要跟踪此情况,您将不得不捕获导致解析失败的精确输入。