在s3上使用Spark(2.11)(Java,Spark独立版)
我收到org.apache.http.NoHttpResponseException:my-bucket.s3.amazonaws.com:443无法响应 尝试致电
时rdd大(〜20m条记录)
我有以下代码-
myRdd.saveAsTextFile(myDir);
运行时,我有两个问题-
1)如果有效,则非常慢 2)大约有10%的时间我得到了例外
2019-02-18 18:51:42,820 [my-app] [s3a-transfer-shared--pool9-t331] INFO com.amazonaws.http.AmazonHttpClient-无法执行HTTP请求:my-bucket.s3 .amazonaws.com:443无法回应 org.apache.http.NoHttpResponseException:my-bucket.s3.amazonaws.com:443无法响应 在org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143) 在org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) 在org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261) 在org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) 在org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:259) 在org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:209) 在org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272) 在com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66) 在org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124) 在org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686) 在org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:488) 在org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884) 在org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) 在org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) 在com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) 在com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) 在com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) 在com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507) 在com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143) 在com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131) 在com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189) 在com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134) 在com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46) 在java.util.concurrent.FutureTask.run(FutureTask.java:266) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) 在java.lang.Thread.run(Thread.java:748)
任何想法我该如何解决?
谢谢,尼桑