Question

我想抓取网址＆＃34; http://www.gc-zb.com/index/index.html＆＃34; 但当我这样操作时，我得到了错误：

public class InvitedBids {

    public static void main(String[] args) throws IOException {

        InputStream inputStream=null;
        HttpURLConnection httpConn=null;
        InputStreamReader inputStreamReader=null;
        BufferedReader bufferedReader=null;
        StringBuilder contentBuf=null;
        String myURL="http://www.gc-zb.com/index/index.html";
        URL url= null;
        try {
            url = new URL(myURL);
            System.out.println(url);
            httpConn= (HttpURLConnection) url.openConnection(); 
            httpConn.setRequestMethod("GET");

            inputStream=httpConn.getInputStream();  //error occurs
            inputStreamReader=new InputStreamReader(inputStream,"utf-8"); 
            bufferedReader=new BufferedReader(inputStreamReader); 
            String line="";
            contentBuf=new StringBuilder();
            while ((line = bufferedReader.readLine())!= null) {  
                contentBuf.append(line);
            }
            String buf=contentBuf.toString();
            System.out.println(buf);
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
           //close I/O and HTTP
        }

    }
}

控制台说：

http://www.gc-zb.com/index/index.html
java.io.IOException: Server returned HTTP response code: 521 for URL: http://www.gc-zb.com/index/index.html
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at com.feilong.bid.InvitedBids.main(InvitedBids.java:43)

任何知道如何解决它的人。谢谢你！

Answer 1

像Jon先前说的那样，错误521表示服务器已关闭。所以不要担心，等到服务器启动。 cloudflare docs

不过，如果您想抓取文档，我强烈建议您使用JSOUP来获取数据。我们在我的公司使用它并且它rulz。

例如：

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

文档： https://jsoup.org/cookbook/input/load-document-from-url

Answer 2

推荐使用 org.apache.commons.httpclient.HttpClient
解决问题

java.io.IOException：服务器返回HTTP响应代码：521用于URL：

2 个答案: