当我尝试通过Spark 1.6.3创建h2o contetx时,我在代码中遇到异常
17/11/06 12:01:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[H2O Launcher thread,5,main]
java.lang.NoSuchMethodError: org.joda.time.DateTime.now()Lorg/joda/time/DateTime;
at water.util.Timer.nowAsLogString(Timer.java:38)
at water.util.Log.header(Log.java:163)
at water.util.Log.write0(Log.java:131)
at water.util.Log.write0(Log.java:124)
at water.util.Log.write(Log.java:109)
at water.util.Log.log(Log.java:86)
at water.util.Log.info(Log.java:72)
at water.H2OSecurityManager.<init>(H2OSecurityManager.java:57)
at water.H2OSecurityManager.instance(H2OSecurityManager.java:79)
at water.H2ONode.<init>(H2ONode.java:127)
编辑:我已经附加了POM文件,它是一个长文件,但它显示了依赖项。我认为我的依赖关系会出现问题。
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>au.com.vroc.mdm</groupId>
<artifactId>mdm</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<java.version>1.8</java.version>
<gson.version>2.8.0</gson.version>
<java.home>${env.JAVA_HOME}</java.home>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.1</version>
<!-- <scope>provided</scope> -->
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/ai.h2o/h2o-core -->
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-core</artifactId>
<version>3.14.0.7</version>
<!-- <scope>runtime</scope> -->
</dependency>
<!-- https://mvnrepository.com/artifact/ai.h2o/h2o-algos -->
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-algos</artifactId>
<version>3.14.0.7</version>
<!-- <scope>runtime</scope> -->
</dependency>
<!-- https://mvnrepository.com/artifact/ai.h2o/h2o-genmodel -->
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-genmodel</artifactId>
<version>3.14.0.7</version>
<!-- <scope>runtime</scope> -->
</dependency>
<!-- https://mvnrepository.com/artifact/ai.h2o/sparkling-water-core_2.10 -->
<dependency>
<!-- <groupId>ai.h2o</groupId> <artifactId>sparkling-water-core_2.10</artifactId>
<version>1.6.11</version> -->
<groupId>ai.h2o</groupId>
<artifactId>sparkling-water-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>${gson.version}</version>
</dependency>
<dependency>
<groupId>com.cloudera.livy</groupId>
<artifactId>livy-client-http</artifactId>
<version>0.3.0</version>
</dependency>
<dependency>
<groupId>com.cloudera.livy</groupId>
<artifactId>livy-api</artifactId>
<version>0.3.0</version>
</dependency>
<dependency>
<groupId>it.unimi.dsi</groupId>
<artifactId>fastutil</artifactId>
<version>7.1.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.5</version>
</dependency>
<!-- <dependency> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId>
<scope>system</scope> <version>1.8</version> <systemPath>${java.home}/lib/tools.jar</systemPath>
</dependency> -->
<!-- https://mvnrepository.com/artifact/joda-time/joda-time -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.7.0-HBase-1.1</version>
<!-- <scope>provided</scope> -->
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.4.0</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>cloudera.repo</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<name>Cloudera Repositories</name>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>Local repository</id>
<url>file://${basedir}/lib</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5.2</version>
<!-- <version>3.0.0</version> -->
<configuration>
<!-- get all project dependencies -->
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<!--<id>assemble-all</id> -->
<!-- bind to the packaging phase -->
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
创建模型完全由livyclient完成,如下所示:
public RegressionMetric call(JobContext ctx) throws Exception {
if (!checkInputValid()) {
throw new IllegalArgumentException("Mandatory parameters are not set");
} else {
RegressionMetric metric = new RegressionMetric();
Dataset<Row> sensordataDF = this.InitializeH2OModel(ctx);
SQLContext hc = ctx.sqlctx();
// Save the H2OContext so that we can extract the H2oFrames later
H2OContext h2oContext = H2OContext.getOrCreate(ctx.sc().sc());
//...
}
}
在上面,InitializeH2OModel(ctx)是一个复杂函数,它生成用于训练模型的火花框架。 prgram可以正常运行,直到启动h2o上下文的行&#34; H2OContext h2oContext = H2OContext.getOrCreate(ctx.sc()。sc());&#34;
我添加到livy的配置参数如下:
LivyClient client = new LivyClientBuilder().setURI(new URI(livyUrl)).setConf("spark.executor.instances", "9")
.setConf("spark.driver.memory", "20g")
.setConf("spark.driver.cores", "5")
.setConf("spark.executor.memory", "16g") // memory per executor
.setConf("spark.executor.cores", "5")
.setConf("spark.yarn.executor.memoryOverhead", "7000")
.setConf("spark.rdd.compress", "true")
.setConf("spark.default.parallelism", "3000")
.setConf("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.setConf("spark.driver.extraJavaOptions", "-XX:+UseG1GC -XX:MaxPermSize=10000m -Xss5000m")
.setConf("spark.executor.extraJavaOptions", "-XX:+UseG1GC -XX:MaxPermSize=10000m -Xss5000m")
.setConf("spark.shuffle.compress", "true")
.setConf("spark.shuffle.spill.compress", "true")
.setConf("spark.kryoserializer.buffer.max", "1g")
.setConf("spark.shuffle.io.maxRetries", "6")
.setConf("spark.sql.shuffle.partitions", "7000")
.setConf("spark.sql.files.maxPartitionBytes", "5000")
.setConf("spark.driver.extraClassPath",
"/usr/hdp/2.6.2.0-205/phoenix/phoenix-4.7.0.2.6.2.0-205-client.jar:/usr/hdp/2.6.2.0-205/phoenix/phoenix-4.7.0.2.6.2.0-205-server.jar:/usr/hdp/2.6.2.0-205/phoenix/lib/phoenix-spark-4.7.0.2.6.2.0-205.jar:/usr/hdp/2.6.2.0-205/hbase/lib/hbase-common-1.1.2.2.6.2.0-205.jar:/usr/hdp/2.6.2.0-205/hbase/lib/hbase-server-1.1.2.2.6.2.0-205.jar:/usr/hdp/2.6.2.0-205/hbase/lib/hbase-server-1.1.2.2.6.2.0-205")
.setConf("spark.executor.extraClassPath",
"/usr/hdp/2.6.2.0-205/phoenix/phoenix-4.7.0.2.6.2.0-205-client.jar:/usr/hdp/2.6.2.0-205/phoenix/phoenix-4.7.0.2.6.2.0-205-server.jar:/usr/hdp/2.6.2.0-205/phoenix/lib/phoenix-spark-4.7.0.2.6.2.0-205.jar:/usr/hdp/2.6.2.0-205/hbase/lib/hbase-common-1.1.2.2.6.2.0-205.jar:/usr/hdp/2.6.2.0-205/hbase/lib/hbase-server-1.1.2.2.6.2.0-205.jar:/usr/hdp/2.6.2.0-205/hbase/lib/hbase-server-1.1.2.2.6.2.0-205")
.setConf("spark.ext.h2o.cluster.size", "-1")
.setConf("spark.ext.h2o.cloud.timeout", "60000")
.setConf("spark.ext.h2o.spreadrdd.retries", "-1")
.setConf("spark.ext.h2o.nthreads", "-1")
.setConf("spark.ext.h2o.disable.ga", "true")
.setConf("spark.ext.h2o.dummy.rdd.mul.factor", "10")
.setConf("spark.ext.h2o.fail.on.unsupported.spark.param", "false")
.setConf("spark.cassandra.input.split.size_in_mb", "64")
.setConf("spark.driver.maxResultSize", "3g")
.setConf("spark.network.timeout", "1000s")
.setConf("spark.executor.heartbeatInterval", "600s")
.build();
我使用Spark 2.1.1在群集模式下运行HDP 2.6.2。
答案 0 :(得分:2)
您使用的是Spark 2.1还是Spark 1.6?在问题的最开始,你指的是Spark 1.6,但是引用了Spark 2.1。我会假设它是2.1。
关于您的问题,您正在混合pom文件中的版本。您指定了H2O 3.14.0.7的依赖关系,但是您使用的是基于H2O 3.10.4.2的Sparkling Water 2.1.1。两个版本都需要不同版本的JODA库,这也是您看到上述错误的原因。
解决方案是在你的pom文件中指定闪烁的水依赖性。 H2O已经捆绑在苏打水中,你不应该明确指定它们。
您应该放入pom文件的依赖项是:
ai.h2o:sparkling-water-core_2.11:2.1.16
ai.h2o:sparkling-water-examples_2.11:2.1.16
no.priv.garshol.duke:duke:1.2
此外,建议使用最新的苏打水版本,如果Spark 2.1.x是Sparkling Water 2.1.16。
我们正在研究这个PR https://github.com/h2oai/sparkling-water/pull/352,这将简化这一点,而不是这3个依赖关系,你可以只指定一个超级依赖:
ai.h2o:sparkling-water-package_2.11:2.1.16