Question

我正在从网站上抓取数据，然后从网站上获取HTML代码，然后用Java解析。

我目前正在使用java.net.URL以及java.net.URLConnection。这是我用来从某个网站获取HTML代码的代码（在this website上找到，稍加编辑以符合我的需要）：

public static String getURL(String name) throws Exception{

    //Set URL
    String s = "";
    URL url = new URL(name);
    URLConnection spoof = url.openConnection();

    //Spoof the connection so we look like a web browser
    spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
    BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
    String strLine = "";

    //Loop through every line in the source
    while ((strLine = in.readLine()) != null){

        //Prints each line to the console
        s = s + strLine + "\n";
    }
    return s;
}

运行时，大约100-200个网页正确接收HTML代码。但是，在我完成抓取HTML代码之前，我得到一个“java.io.IOException：服务器返回HTTP响应代码：503 for URL”异常。我已经完全研究了这个主题，而this这样的其他问题并没有涵盖我正在使用的软件包。

提前感谢您的帮助！

Answer 1

也许服务器有限制。在这种情况下，您可以尝试Socket和input / outputStream而不是URLConnection

java.io.IOException：服务器返回HTTP响应代码：503为URL：错误

1 个答案: