在Spark上运行AWS S3客户端时出现NoSuchMethodError,而javap则另有说明

时间:2014-07-16 19:19:30

标签: java amazon-s3 httpclient apache-spark

我遇到了运行在Apache Spark之上的一段代码的运行时问题。我依靠AWS SDK将文件上传到S3 - 这是错误的NoSuchMethodError。值得注意的是,我使用了捆绑了Spark依赖项的超级jar。运行代码时出错:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:165)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:357)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:339)

然而,当我检查jar的方法签名时,我看到它清楚地列出:

vagrant@mesos:~/installs/spark-1.0.1-bin-hadoop2$ javap -classpath /tmp/rickshaw-spark-0.0.1-SNAPSHOT.jar org.apache.http.impl.conn.DefaultClientConnectionOperator
Compiled from "DefaultClientConnectionOperator.java"
public class org.apache.http.impl.conn.DefaultClientConnectionOperator implements     org.apache.http.conn.ClientConnectionOperator {
protected final org.apache.http.conn.scheme.SchemeRegistry schemeRegistry;
protected final org.apache.http.conn.DnsResolver dnsResolver;
public  org.apache.http.impl.conn.DefaultClientConnectionOperator(org.apache.http.conn.scheme.SchemeRegistry);
public org.apache.http.impl.conn.DefaultClientConnectionOperator(org.apache.http.conn.scheme.SchemeRegistry, org.apache.http.conn.DnsResolver); <-- it exists!
public org.apache.http.conn.OperatedClientConnection createConnection();
public void openConnection(org.apache.http.conn.OperatedClientConnection, org.apache.http.HttpHost, java.net.InetAddress, org.apache.http.protocol.HttpContext, org.apache.http.params.HttpParams) throws java.io.IOException;
public void updateSecureConnection(org.apache.http.conn.OperatedClientConnection, org.apache.http.HttpHost, org.apache.http.protocol.HttpContext, org.apache.http.params.HttpParams) throws java.io.IOException;
protected void prepareSocket(java.net.Socket, org.apache.http.protocol.HttpContext, org.apache.http.params.HttpParams) throws java.io.IOException;
protected java.net.InetAddress[] resolveHostname(java.lang.String) throws java.net.UnknownHostException;

}

我检查了火花发布中的其他一些罐子 - 他们似乎没有这种特殊的方法签名。因此,我想知道Spark运行时正在拾取什么导致此问题。 jar是在maven项目上构建的,我在其中排列了依赖项,以确保正确的aws java sdk依赖项也被选中。

1 个答案:

答案 0 :(得分:2)

Spark 1.0.x发行版已经包含一个不兼容的DefaultClientConnectionOperator版本,并且没有简单的方法来替换它。

我发现的唯一解决方法是包含PoolingClientConnectionManager的自定义实现,以避免使用缺少的构造函数。

更换:

return new DefaultClientConnectionOperator(schreg, this.dnsResolver);

有:

return new DefaultClientConnectionOperator(schreg);

你需要确定,你的课程将被包括在内:

case PathList("org", "apache", "http", "impl", xs @ _*) => MergeStrategy.first

自定义PoolingClientConnectionManager:https://gist.github.com/felixgborrego/568f3460d82d9c12e23c