我尝试从当天的NASA图像中读取/解析rss feed。 这是下面的代码。我得到了一个例外,告诉我这个:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at Start.processFeed(Start.java:30)
at Loader.main(Loader.java:12)
我做错了什么?
P.S。当然我有另一个主要方法的课程:)
提前致谢。
import java.io.InputStream;
import java.net.URL;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class Start extends DefaultHandler {
private String url = "http://www.nasa.gov/rss/dyn/image_of_the_day.rss";
private boolean inUrl = false;
private boolean inTitle = false;
private boolean inDescription = false;
private boolean inItem = false;
private boolean inDate = false;
public void processFeed() {
try {
SAXParserFactory factory =
SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(url).openStream();
reader.parse(new InputSource(inputStream));
} catch(Exception e) {
e.printStackTrace();
}
} // processFeed
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if(localName.startsWith("item")) { inItem = true; }
else if (inItem) {
if(localName.equals("title")) { inTitle = true; }
else { inTitle = false; }
if(localName.equals("description")) { inDescription = true; }
else { inDescription = false; }
if(localName.equals("pubDate")) { inDate = true; }
else { inDate = false; }
}
}
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String chars = new String(ch).substring(start, start + length);
if(inTitle) { System.out.println(chars); }
if(inDescription) { System.out.println(chars); }
if(inDate) { System.out.println(chars); }
}
}
答案 0 :(得分:1)
响应实体是gzip编码的(所以它是压缩的)!您可以将输入流包装到GZIPInputStream
:
InputStream inputStream = new GZIPInputStream(new URL(url).openStream());
您应该通过URLConnnection
使用“长格式”阅读网址,以便您可以更好地控制连接,并可以测试内容是否已压缩。
URL url = new URL(urlString);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
// we're not really connected now. Just the connection object has been created
// here you can set additional request properties (e.g. request headers)
con.connect();
// now we are connected!
if (con.getResponseCode() == HttpURLConnection.HTTP_OK) {
try (InputStream entityStream = con.getInputStream()) {
InputStream is;
if ("gzip".equals(con.getContentEncoding())) {
is = new GZIPInputStream(entityStream); // wrap
} else {
is = entityStream;
}
reader.parse(new InputSource(is));
}
} else {
// handle HTTP response code != OK
}
con.disconnect();