解析用于使用TagSoup创建DOM的页面的URL时的java IOException

时间:2015-01-03 08:38:16

标签: java dom ioexception tag-soup

使用以下链接我正在尝试创建URL的DOM树(这是返回此异常的特定URL):

     String url="http://www.kingfisher.org/";

    Parser p = new Parser();
    SAX2DOM sax2dom ;
    org.w3c.dom.Node doc ;

    p.setFeature(Parser.namespacesFeature, false);
    p.setFeature(Parser.namespacePrefixesFeature, false);
    sax2dom = new SAX2DOM(true);
    p.setContentHandler(sax2dom);
    p.parse(new InputSource(url));
    doc = sax2dom.getDOM();

但是当我为这个网址运行我的程序时,它会在p.parse(new InputSource(url));给我一个我不知道为什么的除外。到目前为止它没有任何问题。

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.kingfisher.org/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1838)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
at org.ccil.cowan.tagsoup.Parser.getInputStream(Parser.java:510)
at org.ccil.cowan.tagsoup.Parser.getReader(Parser.java:487)
at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:440)
at pageparsertest.PageParserTest.main(PageParserTest.java:92)

任何提示?

1 个答案:

答案 0 :(得分:1)

如果你使用HttpURLConnection,你应该能够从java访问网页的请求。请尝试以下代码:

String url = "http://www.kingfisher.org/";
        URL uri = new URL(url);
        HttpURLConnection httpcon = (HttpURLConnection) uri.openConnection();
        httpcon.addRequestProperty("User-Agent", "Mozilla/4.76");
        p.parse(new InputSource(httpcon.getInputStream()));