如何在java中进行批量http调用

时间:2017-06-26 22:35:10

标签: java http batch-file apache-httpclient-4.x

我正在尝试通过Http访问另一个服务来使用HttpClient获取数据。 uri应该看起来像端点:80 /.../ itemId。

我想知道是否有办法进行批量调用以指定一组itemIds?我确实在创建请求时发现有人建议使用.setHeader(HttpHeaders.CONNECTION,“keep-alive”)。通过这样做,我怎样才能在获取所有数据后释放客户端?

此外,似乎这个方法仍然需要获得一个响应然后发送另一个请求?这可能是async,以及如何做到这一点?顺便说一下,在这种情况下我似乎无法使用AsyncHttpClient。

由于我对HttpClient几乎一无所知,这个问题可能看起来很愚蠢。真的希望有人能帮我解决问题。

1 个答案:

答案 0 :(得分:1)

服务器上的API支持

API很可能支持一次请求多个ID(例如,使用http://endpoint:80/.../itemId1,itemId2,itemId3形式的网址)。检查API文档以确定它是否可用,因为如果这样,那将是最好的解决方案。

持久连接

默认情况下,Apache HttpClient使用持久(“保持活动”)连接(请参阅Connection Management tutorial中链接的@kichik's comment)。 logging facilities可以帮助验证连接是否可以重复用于多个请求。

要释放客户端,请使用close()方法。来自2.3.4. Connection manager shutdown

  

当不再需要HttpClient实例并且即将超出范围时,关闭其连接管理器以确保管理器保持活动的所有连接都被关闭并释放这些连接分配的系统资源是很重要的。

CloseableHttpClient httpClient = <...>
httpClient.close();

持久连接消除了建立新连接的开销,但正如您所知,客户端仍会在发送下一个请求之前等待响应。

多线程和连接池

您可以使程序成为多线程,并使用PoolingHttpClientConnectionManager来控制对服务器的连接数。以下是基于2.3.3. Pooling connection manager2.4. Multithreaded request execution的示例:

import java.io.*;
import org.apache.http.*;
import org.apache.http.client.*;
import org.apache.http.client.methods.*;
import org.apache.http.client.protocol.*;
import org.apache.http.impl.client.*;
import org.apache.http.impl.conn.*;
import org.apache.http.protocol.*;

// ...
PoolingHttpClientConnectionManager cm =
        new PoolingHttpClientConnectionManager();
cm.setMaxTotal(200); // increase max total connection to 200
cm.setDefaultMaxPerRoute(20); // increase max connection per route to 20
CloseableHttpClient httpClient = HttpClients.custom()
        .setConnectionManager(cm)
        .build();

String[] urisToGet = { ... };
// start a thread for each URI
// (if there are many URIs, a thread pool would be better)
Thread[] threads = new Thread[urisToGet.length];
for (int i = 0; i < threads.length; i++) {
    HttpGet httpget = new HttpGet(urisToGet[i]);
    threads[i] = new Thread(new GetTask(httpClient, httpget));
    threads[i].start();
}
// wait for all the threads to finish
for (int i = 0; i < threads.length; i++) {
    threads[i].join();
}

class GetTask implements Runnable {
    private final CloseableHttpClient httpClient;
    private final HttpContext context;
    private final HttpGet httpget;

    public GetTask(CloseableHttpClient httpClient, HttpGet httpget) {
        this.httpClient = httpClient;
        this.context = HttpClientContext.create();
        this.httpget = httpget;
    }

    @Override
    public void run() {
        try {
            CloseableHttpResponse response = httpClient.execute(
                httpget, context);
            try {
                HttpEntity entity = response.getEntity();
            } finally {
                response.close();
            }
        } catch (ClientProtocolException ex) {
            // handle protocol errors
        } catch (IOException ex) {
            // handle I/O errors
        }
    }
}

多线程将帮助使链接饱和(保持尽可能多的数据流),因为当一个线程正在发送请求时,其他线程可以接收响应并利用下行链路。

流水线

HTTP / 1.1支持pipelining,它在单个连接上发送多个请求而无需等待响应。 Asynchronous I/O based on NIO tutorial3.10. Pipelined request execution部分中有一个示例:

HttpProcessor httpproc = <...>
HttpAsyncRequester requester = new HttpAsyncRequester(httpproc);
HttpHost target = new HttpHost("www.apache.org");
List<BasicAsyncRequestProducer> requestProducers = Arrays.asList(
    new BasicAsyncRequestProducer(target, new BasicHttpRequest("GET", "/index.html")),
    new BasicAsyncRequestProducer(target, new BasicHttpRequest("GET", "/foundation/index.html")),
    new BasicAsyncRequestProducer(target, new BasicHttpRequest("GET", "/foundation/how-it-works.html"))
);
List<BasicAsyncResponseConsumer> responseConsumers = Arrays.asList(
    new BasicAsyncResponseConsumer(),
    new BasicAsyncResponseConsumer(),
    new BasicAsyncResponseConsumer()
);
HttpCoreContext context = HttpCoreContext.create();
Future<List<HttpResponse>> future = requester.executePipelined(
    target, requestProducers, responseConsumers, pool, context, null);

HttpCore Examples(“管道化HTTP GET请求”)中有此示例的完整版本。

较旧的Web服务器可能无法正确处理流水线请求。