从SDK读取S3存储桶时的java.util.concurrent.TimeoutException

时间:2018-01-04 07:00:56

标签: java scala apache-spark amazon-s3 emr

我用Java编写了一段代码,连接到S3并读取存储桶中的文件列表。我测试了代码并且工作正常(在本地和EC2中)。但是,当我在EMR中运行代码时,我遇到了这个错误:

 "java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]

我怀疑它可能是因为ClientConfiguration而且我将所有的Timeout,RequestTimeout和SocketTimeOut增加到5分钟,但仍然没有改变,我又得到了同样的错误。这个错误的原因是什么?桶的内容很大,但在EC2的客户端模式下,我可以在3分钟内读取所有内容!事应该是错的!

这是我的代码:

try {
        ListVersionsRequest request = new ListVersionsRequest().withBucketName(bucketname).withPrefix(prefix)
                .withMaxResults(999999);

VersionListing versionListing;
   do {
            versionListing = s3.listVersions(request);
            for (S3VersionSummary versionSummary : versionListing.getVersionSummaries()) {
                if (conditions.length != 0)
                    if (contain_str(versionSummary.getKey(), conditions))
                        collect.add("s3://" + bucketname + "/" + versionSummary.getKey());
                    else
                        continue;
                else
                    collect.add("s3://" + bucketname + "/" + versionSummary.getKey());
            }
            request.setKeyMarker(versionListing.getNextKeyMarker());
            request.setVersionIdMarker(versionListing.getNextVersionIdMarker());

        } while (versionListing.isTruncated()); //check remaining pagination
    }

catch (AmazonServiceException ase) {
        System.out.println("Caught an AmazonServiceException, " +
                "which means your request made it " +
                "to Amazon S3, but was rejected with an error response " +
                "for some reason.");
        System.out.println("Error Message:    " + ase.getMessage());
        System.out.println("HTTP Status Code: " + ase.getStatusCode());
        System.out.println("AWS Error Code:   " + ase.getErrorCode());
        System.out.println("Error Type:       " + ase.getErrorType());
        System.out.println("Request ID:       " + ase.getRequestId());
    } catch (AmazonClientException ace) {
        System.out.println("Caught an AmazonClientException, " +
                "which means the client encountered " +
                "an internal error while trying to communicate" +
                " with S3, " +
                "such as not being able to access the network.");
        System.out.println("Error Message: " + ace.getMessage());
    }
    finally {
        Collections.shuffle(collect);       //randomizing the file orders
        if (collect.size() * percentage < 1)     // if sample number is less than 1 take the whole data
            percentage = 1;
        collect2 = collect.subList(0, (int) (collect.size() * percentage)); //take the samples
        return String.join(",", collect2);
    }

错误:

18/01/04 10:36:31 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

由于

0 个答案:

没有答案