Question

我尝试过其他方法从网址下载信息，但需要更快的方法。我需要下载并解析大约250个单独的页面，并希望该应用程序看起来不会显得非常慢。这是我目前用于检索单个页面的代码，任何洞察都会很棒。

try 
{
    URL myURL = new URL("http://www.google.com");
    URLConnection ucon = myURL.openConnection();
    InputStream inputStream = ucon.getInputStream();
    BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
    ByteArrayBuffer byteArrayBuffer = new ByteArrayBuffer(50);
    int current = 0;
    while ((current = bufferedInputStream.read()) != -1) {
        byteArrayBuffer.append((byte) current);
    }
    tempString = new String(byteArrayBuffer.toByteArray());

} 
catch (Exception e) 
{
    Log.i("Error",e.toString());
}

Answer 1

如果请求是同一台服务器，请尝试保持连接打开。另外，尽量避免缓冲区中的重新分配，并尽可能一次性地读取。


const int APPROX_MAX_PAGE_SIZE = 300;
try 
{
    URL myURL = new URL("http://www.google.com");
    URLConnection ucon = myURL.openConnection();
    ucon.setRequestHeader("Connection", "keep-alive") // (1)
    InputStream inputStream = ucon.getInputStream();
    BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
    ByteArrayBuffer byteArrayBuffer = new ByteArrayBuffer(APPROX_MAX_PAGE_SIZE); // (2)
    int current = 0;
    byte[] buf = new byte[APPROX_MAX_PAGE_SIZE];
    int read;
    do {
       read = bufferedInputStream.read(buf, 0, buf.length); // (3)
       if(read > 0) byteArrayBuffer.append(buf, 0, read);
    } while (read >= 0);
    tempString = new String(byteArrayBuffer.toByteArray());

} 
catch (Exception e) 
{
    Log.i("Error",e.toString());
}

设置Keep-alive标头（不确定是否需要它，在J2SE上它也是一个可配置的属性）
在缓冲区中分配“通常足够”的内容以避免重新分配。
一次读取多个字节

免责声明：这是在“盲人”中编写的，无法访问Java编译器。可能是setRequestHeader仅在HttpURLConnection（需要强制转换）上可用，或者某些参数错误，但如果是这样，请随时编辑。

Answer 2

为什么不使用内置的apache http组件？

HttpClient httpClient = new DefaultHttpClient();
HttpGet request = new  HttpGet(uri);
HttpResponse response = httpClient.execute(request);

int status = response.getStatusLine().getStatusCode();

if (status != HttpStatus.SC_OK) {
    ByteArrayOutputStream ostream = new ByteArrayOutputStream();
    response.getEntity().writeTo(ostream);
}

Answer 3

使用合并的HTTPClient并尝试一次发出2或3个请求。并尝试创建一个内存池，以避免分配和GC停止。

有没有更快的方法将网页从网页下载到字符串？

3 个答案: