从S3读取对象时,Spark Job遇到问题,无法执行HTTP请求

时间:2019-05-02 06:16:26

标签: amazon-s3 amazon-emr

我正在尝试从s3中读取,以提供keybucket来获取输入流,即S3ObjectInputStream

关于我为什么会遇到问题的任何见解,我都可以在本地运行,但是当我在EMR上运行时,我在下面遇到此错误

Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool

事情,我尝试以s3object的身份关闭s3object.close  返回值之前。但后来我得到Exception in thread "main" java.io.IOException: Attempted read on closed stream.

所以放弃那个...

  def getS3Object(s3Client: AmazonS3, bucketName: String, key: String): S3ObjectInputStream = {
    val s3Object = s3Client.getObject(bucketName, key)
    val objectContent = s3Object.getObjectContent
    objectContent
  }
  

线程“ main”中的异常com.amazonaws.SdkClientException:无法执行   执行HTTP请求:超时等待来自池的连接   com.amazonaws.http.AmazonHttpClient $ RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)     在   com.amazonaws.http.AmazonHttpClient $ RequestExecutor.executeHelper(AmazonHttpClient.java:1121)     在   com.amazonaws.http.AmazonHttpClient $ RequestExecutor.doExecute(AmazonHttpClient.java:770)     在   com.amazonaws.http.AmazonHttpClient $ RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)     在   com.amazonaws.http.AmazonHttpClient $ RequestExecutor.execute(AmazonHttpClient.java:726)     在   com.amazonaws.http.AmazonHttpClient $ RequestExecutor.access $ 500(AmazonHttpClient.java:686)     在   com.amazonaws.http.AmazonHttpClient $ RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)     在   com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)     在   com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)     在   com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)     在   com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)     在   com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1467)     在   com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1326)     在   content.spark.ContentIngestion.getS3Object(ContentIngestion.scala:54)     在   content.spark.ContentIngestion $$ anonfun $ 1.apply(ContentIngestion.scala:45)     在   content.spark.ContentIngestion $$ anonfun $ 1.apply(ContentIngestion.scala:45)     在   scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)     在   scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)     在   scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)     在   scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)     在   scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)     在scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186)     在   content.spark.ContentIngestion.getSolrDocuments(ContentIngestion.scala:45)     在content.spark.Main $ .main(Main.scala:57)处   content.spark.Main.main(Main.scala)位于   sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)

Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool

0 个答案:

没有答案