我有一个java spark进程,我调用它 - spark-submit --class MyClass target / MyJar.jar
这个过程的最后一部分是在本地编写,然后将其复制到s3,因为它需要具有特定的名称(也可以写入s3和mv'ed那里,为了问题,错误仍然存在相同)。
代码编译并运行但是当它到达下面的代码片段时,我得到以下错误 -
java.lang.NoClassDefFoundError:com / amazonaws / services / s3 / AmazonS3 at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:348)at org.apache.spark.util.Utils $ .classForName(Utils.scala:229)at at org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:700) 在 org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:187) 在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:212) 在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:126) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)导致 by:java.lang.ClassNotFoundException: com.amazonaws.services.s3.AmazonS3 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
<body>
<form method="GET" class="post-form">{% csrf_token %}
<label for="autocomplete">Select a programming language: </label>
<input name="q" id="autocomplete">
</form>
<script>
$("input[name='q']").autocomplete({
source: '/ajax_search_boards/',
select: function(e, ui) {
if ($("input[name='q']").val(ui.item.value)) {
$("#autocomplete").val(ui.item.value);
$("#autocomplete").submit();
location.href = '/'; // this will redirect me, but it wont redirect me with a new board created.
}
}
});
</script>
</body>
我使用以下可能相关的依赖
public static void saveToS3(Dataset<Row> df, String outputBucket, String outputPath) throws IOException {
String tmpFile = "temp" + Long.toString(System.nanoTime());
df.coalesce(1).write().option("header", true).csv(tmpFile);
File directory = new File(tmpFile);
AmazonS3 s3client = new AmazonS3Client();
for (File file : directory.listFiles()) {
if (file.getName().startsWith("part_") || file.getName().endsWith("csv")) {
s3client.putObject(new PutObjectRequest(outputBucket, getS3path(outputPath), file));
}
file.delete();
}
directory.delete();
}
有什么想法吗?
依赖树 -
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.7.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api-2.5</artifactId>
</exclusion>
</exclusions>
</dependency>
答案 0 :(得分:0)
您似乎使用了2014年的really old version of AWS SDK (1.7.4)。Your apache spark is 2.1, which is from 2016.
Commit history for the AmazonS3Client only goes back to 1.9,因此您的AWS开发工具包可能不存在。
您的代码编译的原因是您的代码本身并不是问题,但您的某个依赖项是尝试使用旧版SDK没有的最新版本的S3客户端。换句话说,这不是代码问题,而是一个依赖管理问题。如果您根本不直接使用SDK,那么使用更合适的AWS SDK 1.11.X是安全的,如果您愿意,那么您必须哄骗您的代码。< / p>
通常情况下,spark-core模块会指出它对aws版本的依赖性,但我认为它是专门排除在它自己之外的,所以你可以提供自己的SDK版本供你使用而不会有任何冲突,需要注意的是,如果你使用过于依赖的东西,它就会中断。
答案 1 :(得分:0)
按摩我的pom文件后,我可以使用以下pom文件进入稳定版并打包 -
mvn clean compile assembly:single;
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.group</groupId>
<artifactId>arifact</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<properties>
<java.version>1.8</java.version>
<jdk.version>1.8</jdk.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.8.2</version>
</dependency>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.229</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>${jdk.version}</source>
<target>${jdk.version}</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.group.artifact.MainClass</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
</plugins>
</build>
</project>