Question

我正在使用4.2.5版。来自org.apache.httpcomponents的AutoRetryHttpClient来从一个方案为 https 的网址下载pdf文件。代码使用NetBeans 7.3编写，并使用JDK7。

假设虚构的pdf资源位于https://www.thedomain.with/my_resource.pdf，那么我有以下代码：

SchemeRegistry registry = new SchemeRegistry();
    try {
        final SSLSocketFactory sf = new SSLSocketFactory(new TrustStrategy() {
            @Override
            public boolean isTrusted(X509Certificate[] chain, String authType)
                    throws CertificateException {
                return true;
            }
        });

        registry.register(new Scheme("https", 3920, sf));            
    } catch (NoSuchAlgorithmException | KeyManagementException | KeyStoreException | UnrecoverableKeyException ex) {
        Logger.getLogger(HttpConnection.class.getName()).log(Level.SEVERE, null, ex);
    }        
    //Here I create the client.
    HttpClient client = new AutoRetryHttpClient(new DefaultHttpClient(new PoolingClientConnectionManager(registry)),
            new DefaultServiceUnavailableRetryStrategy(5, //num of max retries
               100//retry interval)); 

        HttpResponse httpResponse = null;
        try {
            HttpGet httpget = new HttpGet("https://www.thedomain.with/my_resource.pdf");
            //I set header and Mozilla User-Agent
            httpResponse = client.execute(httpget);
        } catch (IOException ex) {
        }
        ... //other lines of code to get and save the file, not really important since the code is never reached

当我致电client.execute时，会抛出以下异常

org.apache.http.conn.HttpHostConnectException: Connection to https://www.thedomain.with refused

如何获取该pdf资源？

PS：我可以通过浏览器下载它，因此存在获取该文件的方法。

Answer 1

似乎有几个问题：

您注册了Scheme以使用3920作为默认端口，这是HTTPS的非标准端口号。如果服务器实际上在该端口上运行，则您必须在浏览器中使用此URL进行访问：https://www.thedomain.with:3920/my_resource.pdf。由于您在浏览器中使用的URL不包含3920端口，因此服务器将在默认端口443上运行，因此您应该将更改new Scheme("https", 3920, sf)更改为new Scheme("https", 443, sf)。
服务器证书中的CN似乎与其主机名不匹配，导致SSLPeerUnverifiedException。为了使其工作，您需要使用SSLSocketFactory(TrustStrategy, HostnameVerifier)构造函数并传递不执行此检查的验证程序。 Apache为此目的提供了AllowAllHostnameVerifier。

注意：您真的不应该在生产代码中使用no-op TrustStrategy和HostnameVerifier，因为这实质上会关闭对远程服务器进行身份验证的所有安全检查，并让您对模拟开放攻击。

使用https方案的URL中的Apache HttpClient和远程文件

1 个答案: