Java获取网页源或超时

时间:2013-05-02 08:52:42

标签: java web web-scraping

我正在尝试从网页上获取数据,但如果页面不可用,程序会运行很长时间,直到超时。我需要它来尝试获取网页10秒,如果它没有得到响应,那么它返回null。我怎么能让它以这种方式工作呢?

以下是我获取数据的方式:

public int getThreadData( String address ) throws IOException{
    String valueString = null;
    URL url = new URL( "http://" + address + ":8080/web-console/ServerInfo.jsp" );
    URLConnection urlConnection = url.openConnection();
    urlConnection.setRequestProperty( "User-Agent",
        "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) 
         Gecko/20100401" );
    BufferedReader br = new BufferedReader( new InputStreamReader
        ( urlConnection.getInputStream(), "UTF-8" ) );

    String inputLine;

    while ( ( inputLine = br.readLine() ) != null )
    {
        if ( inputLine.contains( "#Threads" ) )
        {
            valueString = inputLine.substring( inputLine.indexOf( "/b>" ) + 3 );
            valueString = valueString.substring( 0, valueString.indexOf( "<" ) );
        }
    }
    br.close();

    return Integer.parseInt( valueString );

}

2 个答案:

答案 0 :(得分:2)

您是否尝试过如下设置连接超时:

urlConnection.setConnectTimeout(10000); // 10000 milliseconds

答案 1 :(得分:1)

您应该使用可以大大简化这些事情的HTTP库(如Apache's HTTPClient)。如果您使用的是HTTPClient,您可以执行以下操作:

            // Set the timeout to 20-seconds.
            final HttpParams httpParams = new BasicHttpParams();
            HttpConnectionParams.setConnectionTimeout(httpParams, 20 * 1000);
            HttpConnectionParams.setSoTimeout(httpParams, 20 * 1000);

            DefaultHttpClient httpClient = new DefaultHttpClient(cm, httpParams);
            HttpPost postRequest = new HttpPost(URL);
            HttpResponse response;
            response = httpClient.execute(postRequest);