我用Java编写了一段代码,连接到S3并读取存储桶中的文件列表。我测试了代码并且工作正常(在本地和EC2中)。但是,当我在EMR中运行代码时,我遇到了这个错误:
"java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
我怀疑它可能是因为ClientConfiguration而且我将所有的Timeout,RequestTimeout和SocketTimeOut增加到5分钟,但仍然没有改变,我又得到了同样的错误。这个错误的原因是什么?桶的内容很大,但在EC2的客户端模式下,我可以在3分钟内读取所有内容!事应该是错的!
这是我的代码:
try {
ListVersionsRequest request = new ListVersionsRequest().withBucketName(bucketname).withPrefix(prefix)
.withMaxResults(999999);
VersionListing versionListing;
do {
versionListing = s3.listVersions(request);
for (S3VersionSummary versionSummary : versionListing.getVersionSummaries()) {
if (conditions.length != 0)
if (contain_str(versionSummary.getKey(), conditions))
collect.add("s3://" + bucketname + "/" + versionSummary.getKey());
else
continue;
else
collect.add("s3://" + bucketname + "/" + versionSummary.getKey());
}
request.setKeyMarker(versionListing.getNextKeyMarker());
request.setVersionIdMarker(versionListing.getNextVersionIdMarker());
} while (versionListing.isTruncated()); //check remaining pagination
}
catch (AmazonServiceException ase) {
System.out.println("Caught an AmazonServiceException, " +
"which means your request made it " +
"to Amazon S3, but was rejected with an error response " +
"for some reason.");
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException ace) {
System.out.println("Caught an AmazonClientException, " +
"which means the client encountered " +
"an internal error while trying to communicate" +
" with S3, " +
"such as not being able to access the network.");
System.out.println("Error Message: " + ace.getMessage());
}
finally {
Collections.shuffle(collect); //randomizing the file orders
if (collect.size() * percentage < 1) // if sample number is less than 1 take the whole data
percentage = 1;
collect2 = collect.subList(0, (int) (collect.size() * percentage)); //take the samples
return String.join(",", collect2);
}
错误:
18/01/04 10:36:31 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
由于