Question

我希望使用Java从Internet下载一些文件。

文件位于302重定向和某些安全性的背后。身份验证之后，重定向将指向亚马逊s3存储桶。

例如，https://example.com/getFile?Name=TYPEA进行身份验证然后重定向到https://exampleA.s3.amazonaws.com/TYPEA/TYPEA.file.gz；

https://example.com/getFile?Name=TYPEB进行身份验证，然后重定向到https://exampleB.s3.amazonaws.com/TYPEB/TYPEB.file.gz;

如果我转到http：//版本，它将重定向到https：//版本，然后如上继续（添加额外的重定向）。

我写了一段Java代码来下载一个文件。当我的应用程序启动时，此代码在每个文件的一个线程中运行（例如2个文件= 2个运行相同代码的线程）。每个文件的通用代码执行以下操作：

将基本授权信息（已编码）放入标头
打开连接
检查响应代码
如果是301或302或303，则读取“ Location”标头，并且以上内容循环直到找到文件（响应200）。

我的最初问题是，一旦进入重定向的Amazon部分，我就需要删除Authorization标头，因为Amazon不想要基本的Authorization以及自己的Signature auth密钥。然后，代码在单个线程上为单个文件工作；但是一旦我用两个线程运行它，位置标头就把url主机名混在一起了-所以两个“位置”标头要么是：

或（它似乎随机执行）

这当然会导致一个文件的404错误，而另一个文件的成功200。

如果我留有间隔，以致无法同时访问初始URL，则不会发生此问题。

有人对HTTP响应为何在Location标头中有错误的主机有任何建议吗？

代码中没有静态变量（例如文件位置）会导致这种情况发生。

-通用代码（MCVE-其中fileDownloadURLHeaderHostName = exampleA.s3.amazonaws.com或exampleB.s3.amazonaws.com）：-

    InputStream inputStream = null;
    OutputStream outputStream = null;
    URL url;
    URI uri;
    HttpURLConnection conn;
    try
    {                   
        String authString = openFeedsUser + ":" + openFeedsPass;
        byte[] authEncBytes = Base64.getEncoder().encode(authString.getBytes());
        String authStringEnc = new String(authEncBytes);

        CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));
        url = new URL(fileDownloadURL);
        uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
        String correctEncodedURL=uri.toASCIIString(); 
        url = new URL(correctEncodedURL);
        conn = (HttpURLConnection)url.openConnection();
        conn.setRequestProperty("Authorization", "Basic " + authStringEnc);
        conn.setRequestMethod("GET");
        conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36");
        conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
        conn.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
        conn.setRequestProperty("Upgrade-Insecure-Requests", "1");
        conn.setRequestProperty("Host", fileDownloadURLHeaderHostName);
        conn.setRequestProperty("Accept-Language","en-US,en;q=0.9");
        conn.setRequestProperty("Cache-Control","max-age=0");
        conn.setRequestProperty("Connection","keep-alive");

        // Don't automatically follow redirects, we'll do it manually because we need to set the cookies again once we have the Amazon auth settings; before following the redirect
        conn.setInstanceFollowRedirects(false);
        HttpURLConnection.setFollowRedirects(false);

        boolean redirect = false;
        int redirectCount = 0;
        int redirectLimit = 10;

        // follow all redirects until the limit, or we get to a final destination
        do
        {
            // get the response code, and act accordingly
            int status = conn.getResponseCode();

            if (status != HttpURLConnection.HTTP_OK) 
            {
                if (status == HttpURLConnection.HTTP_MOVED_TEMP
                    || status == HttpURLConnection.HTTP_MOVED_PERM
                        || status == HttpURLConnection.HTTP_SEE_OTHER)
                {
                    redirect = true;
                    redirectCount = redirectCount + 1;
                    // get redirect url from "location" header field
                    String newUrl = conn.getHeaderField("Location");
                    url = new URL(newUrl);
                    uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
                    correctEncodedURL=uri.toASCIIString(); 
                    url = new URL(correctEncodedURL);

                    // get the cookie if need, for login
                    String cookies = conn.getHeaderField("Set-Cookie");

                    // open the new connnection again
                    url = new URL(newUrl);
                    conn = (HttpURLConnection) url.openConnection();

                    conn.setRequestMethod("GET");
                    conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36");
                    conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
                    conn.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
                    conn.setRequestProperty("Upgrade-Insecure-Requests", "1");
                    conn.setRequestProperty("Host", fileDownloadURLHeaderHostName);
                    conn.setRequestProperty("Accept-Language","en-US,en;q=0.9");
                    conn.setRequestProperty("Cache-Control","max-age=0");
                    conn.setRequestProperty("Connection","keep-alive");

                    // only put the basic auth in the header, if the Amazon Signature query string is not present
                    String query = url.getQuery();
                    if (!query.contains("Signature"))
                    {
                        conn.setRequestProperty("Authorization", "Basic " + authStringEnc);
                    }

                    // set the cookies to be the same as the previous request
                    conn.setRequestProperty("Cookie", cookies);
                }
                else // something went wrong
                {
                    redirect = false;
                    InputStream errorStream = conn.getErrorStream();
                    StringBuilder errorMessage = new StringBuilder();
                    String line = null;

                    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(errorStream, StandardCharsets.UTF_8))) 
                    {   
                        while ((line = bufferedReader.readLine()) != null) 
                        {
                            errorMessage.append(line);
                        }
                    }

                    Log("error received while attempting to download file. Details: " + errorMessage.toString()); 
                    return false;
                }
            }
            else // success
            {
                redirect = false;

                inputStream = conn.getInputStream();
                // etc...
                return true;
            }
        }while (redirect == true && redirectCount < redirectLimit);

        if (redirectCount >= redirectLimit)
        {
            Log( "Too many redirects, so not getting file.");           
        }

        return false; //default
    }
    catch (IOException e)
    {
        e.printStackTrace();
        return false;
    }
    catch (URISyntaxException e)
    {
        e.printStackTrace();
        return false;
    }
    finally 
    {
        try 
        {
            if (inputStream != null)
            {
                inputStream.close();
            }
            if (outputStream != null)
            {
                outputStream.close();
            }
        } 
        catch (IOException ioe)
        {
            // nothing to see here
        }
    }

-编辑-

我发现未设置Host标头，所以我添加了System.setProperty("sun.net.http.allowRestrictedHeaders", "true")，这已改变了问题，因此我现在收到一个403（亚马逊签名错误）的信息不起作用，而不是404错误。

-编辑2-- curl命令curl -L -u user:pass -o file.gz "https://example.com/getFile?Name=TYPEA"起作用并下载文件。那么Java与简单的curl命令有什么不同？

-编辑3-- 即使一个curl有效，用ProcessBuilder调用替换上面的http代码也会遇到相同的问题-因此，在服务器端看起来像是一个问题，来自同一IP地址的两个连接在一起的连接会导致它混淆主机。我可以做什么Java来帮助服务器区分？是否以某种方式设置了会话设置？

卷曲代码：

String authString = openFeedsUser + ":" + openFeedsPass;
String authParam = "-u";

File downloadFile = new File(fileDownloadDirectory + File.separator + fileDownloadFileName);
downloadFile.createNewFile();

String canonicalDownloadFilePath = downloadFile.getCanonicalPath();

String downloadPathParam = "-o";
ProcessBuilder processBuilder = new ProcessBuilder("curl","-L",authParam, authString, downloadPathParam, canonicalDownloadFilePath, fileDownloadURL);

Log(className, "getting file from URL : " + fileDownloadURL);
Process curl = processBuilder.start();

// wait for the process to end
while(curl.isAlive())
{
}

Log(className, "CURL Exit code: " + curl.exitValue());

Log(className, "File Downloaded.");

重定向后间歇性地与具有不同主机的amazon s3服务器的多个并行连接会在位置标头中间歇性地提供错误的主机

0 个答案: