我希望使用Java从Internet下载一些文件。
文件位于302重定向和某些安全性的背后。身份验证之后,重定向将指向亚马逊s3存储桶。
例如,https://example.com/getFile?Name=TYPEA进行身份验证然后重定向到https://exampleA.s3.amazonaws.com/TYPEA/TYPEA.file.gz;
https://example.com/getFile?Name=TYPEB进行身份验证,然后重定向到https://exampleB.s3.amazonaws.com/TYPEB/TYPEB.file.gz;
如果我转到http://版本,它将重定向到https://版本,然后如上继续(添加额外的重定向)。
我写了一段Java代码来下载一个文件。当我的应用程序启动时,此代码在每个文件的一个线程中运行(例如2个文件= 2个运行相同代码的线程)。每个文件的通用代码执行以下操作:
我的最初问题是,一旦进入重定向的Amazon部分,我就需要删除Authorization标头,因为Amazon不想要基本的Authorization以及自己的Signature auth密钥。然后,代码在单个线程上为单个文件工作;但是一旦我用两个线程运行它,位置标头就把url主机名混在一起了-所以两个“位置”标头要么是:
或(它似乎随机执行)
这当然会导致一个文件的404错误,而另一个文件的成功200。
如果我留有间隔,以致无法同时访问初始URL,则不会发生此问题。
有人对HTTP响应为何在Location标头中有错误的主机有任何建议吗?
代码中没有静态变量(例如文件位置)会导致这种情况发生。
-通用代码(MCVE-其中fileDownloadURLHeaderHostName
= exampleA.s3.amazonaws.com或exampleB.s3.amazonaws.com):-
InputStream inputStream = null;
OutputStream outputStream = null;
URL url;
URI uri;
HttpURLConnection conn;
try
{
String authString = openFeedsUser + ":" + openFeedsPass;
byte[] authEncBytes = Base64.getEncoder().encode(authString.getBytes());
String authStringEnc = new String(authEncBytes);
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));
url = new URL(fileDownloadURL);
uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString();
url = new URL(correctEncodedURL);
conn = (HttpURLConnection)url.openConnection();
conn.setRequestProperty("Authorization", "Basic " + authStringEnc);
conn.setRequestMethod("GET");
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36");
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
conn.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
conn.setRequestProperty("Upgrade-Insecure-Requests", "1");
conn.setRequestProperty("Host", fileDownloadURLHeaderHostName);
conn.setRequestProperty("Accept-Language","en-US,en;q=0.9");
conn.setRequestProperty("Cache-Control","max-age=0");
conn.setRequestProperty("Connection","keep-alive");
// Don't automatically follow redirects, we'll do it manually because we need to set the cookies again once we have the Amazon auth settings; before following the redirect
conn.setInstanceFollowRedirects(false);
HttpURLConnection.setFollowRedirects(false);
boolean redirect = false;
int redirectCount = 0;
int redirectLimit = 10;
// follow all redirects until the limit, or we get to a final destination
do
{
// get the response code, and act accordingly
int status = conn.getResponseCode();
if (status != HttpURLConnection.HTTP_OK)
{
if (status == HttpURLConnection.HTTP_MOVED_TEMP
|| status == HttpURLConnection.HTTP_MOVED_PERM
|| status == HttpURLConnection.HTTP_SEE_OTHER)
{
redirect = true;
redirectCount = redirectCount + 1;
// get redirect url from "location" header field
String newUrl = conn.getHeaderField("Location");
url = new URL(newUrl);
uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
correctEncodedURL=uri.toASCIIString();
url = new URL(correctEncodedURL);
// get the cookie if need, for login
String cookies = conn.getHeaderField("Set-Cookie");
// open the new connnection again
url = new URL(newUrl);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36");
conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3");
conn.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
conn.setRequestProperty("Upgrade-Insecure-Requests", "1");
conn.setRequestProperty("Host", fileDownloadURLHeaderHostName);
conn.setRequestProperty("Accept-Language","en-US,en;q=0.9");
conn.setRequestProperty("Cache-Control","max-age=0");
conn.setRequestProperty("Connection","keep-alive");
// only put the basic auth in the header, if the Amazon Signature query string is not present
String query = url.getQuery();
if (!query.contains("Signature"))
{
conn.setRequestProperty("Authorization", "Basic " + authStringEnc);
}
// set the cookies to be the same as the previous request
conn.setRequestProperty("Cookie", cookies);
}
else // something went wrong
{
redirect = false;
InputStream errorStream = conn.getErrorStream();
StringBuilder errorMessage = new StringBuilder();
String line = null;
try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(errorStream, StandardCharsets.UTF_8)))
{
while ((line = bufferedReader.readLine()) != null)
{
errorMessage.append(line);
}
}
Log("error received while attempting to download file. Details: " + errorMessage.toString());
return false;
}
}
else // success
{
redirect = false;
inputStream = conn.getInputStream();
// etc...
return true;
}
}while (redirect == true && redirectCount < redirectLimit);
if (redirectCount >= redirectLimit)
{
Log( "Too many redirects, so not getting file.");
}
return false; //default
}
catch (IOException e)
{
e.printStackTrace();
return false;
}
catch (URISyntaxException e)
{
e.printStackTrace();
return false;
}
finally
{
try
{
if (inputStream != null)
{
inputStream.close();
}
if (outputStream != null)
{
outputStream.close();
}
}
catch (IOException ioe)
{
// nothing to see here
}
}
-编辑-
我发现未设置Host
标头,所以我添加了System.setProperty("sun.net.http.allowRestrictedHeaders", "true")
,这已改变了问题,因此我现在收到一个403(亚马逊签名错误)的信息不起作用,而不是404错误。
-编辑2--
curl命令curl -L -u user:pass -o file.gz "https://example.com/getFile?Name=TYPEA"
起作用并下载文件。那么Java与简单的curl命令有什么不同?
-编辑3-- 即使一个curl有效,用ProcessBuilder调用替换上面的http代码也会遇到相同的问题-因此,在服务器端看起来像是一个问题,来自同一IP地址的两个连接在一起的连接会导致它混淆主机。我可以做什么Java来帮助服务器区分?是否以某种方式设置了会话设置?
卷曲代码:
String authString = openFeedsUser + ":" + openFeedsPass;
String authParam = "-u";
File downloadFile = new File(fileDownloadDirectory + File.separator + fileDownloadFileName);
downloadFile.createNewFile();
String canonicalDownloadFilePath = downloadFile.getCanonicalPath();
String downloadPathParam = "-o";
ProcessBuilder processBuilder = new ProcessBuilder("curl","-L",authParam, authString, downloadPathParam, canonicalDownloadFilePath, fileDownloadURL);
Log(className, "getting file from URL : " + fileDownloadURL);
Process curl = processBuilder.start();
// wait for the process to end
while(curl.isAlive())
{
}
Log(className, "CURL Exit code: " + curl.exitValue());
Log(className, "File Downloaded.");