Ignite Spark在YARN模式下保存失败

时间:2019-05-17 17:38:15

标签: apache-spark ignite

我正在使用Ignite 2.3,尝试使用Spark将数据从hdfs实木复合地板文件加载到Ignite缓存。数据已加载,但无法保存到Ignite Cache。

如果我们使用本地模式而不是spark YARN模式,它将正常工作。问题将是,为什么在YARN模式下,将计算分配到群集,为什么Ignite仍需要Ignite_Home?如何解决呢?在YARN模式下有任何可行的示例吗?

我的环境是JDK 8,Ignite 2.3,Spark 2.1.0.cloudera1

我的火花壳是:

env JAVA_HOME="/usr/java/jdk1.8.0_181-amd64" spark2-shell \
  --master yarn \
  --deploy-mode client \
  --driver-memory 12g \
  --conf spark.driver.extraJavaOptions="-DmavenTest=true" \
  --driver-class-path ~/libs/resources \
  --jars my.jar

数据已通过Spark加载,但无法保存到Ignite Cache。

val myDataRDD = ... //Spark RDD    
val igniteCluster = "10.xxx.xxx.xxx:64789"
val igniteContext = new IgniteContext(spark.sparkContext, () => new IgniteConfiguration()
      .setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(new TcpDiscoveryVmIpFinder().setAddresses(List(igniteCluster).asJava)))
      .setWorkDirectory("/tmp/service")
      .setClientMode(true)
      .setDeploymentMode(DeploymentMode.CONTINUOUS)
      .setPeerClassLoadingEnabled(true)
)

val cfg: CacheConfiguration[MyDataKey, MyData] = new CacheConfiguration("cacheName");
cfg.setIndexedTypes(MyDataKey, MyData);
cfg.setCacheMode(PARTITIONED);
cfg.setAtomicityMode(ATOMIC);
cfg.setStatisticsEnabled(true);
cfg.setBackups(1);
cfg.setRebalanceMode(ASYNC);
cfg.setRebalanceBatchSize(2 * 1024 * 1024); //2MB
cfg.setRebalanceBatchesPrefetchCount(8);
val cacheRDD = igniteContext.fromCache[MyDataKey, MyData](cfg)
cacheRDD.savePairs(myDataRDD, true) //Failed

错误消息:

Caused by: org.apache.ignite.IgniteException: Invalid Ignite installation home folder: /myproject/ROOT
  at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:966)
  at org.apache.ignite.Ignition.getOrStart(Ignition.java:417)
  at org.apache.ignite.spark.IgniteContext.ignite(IgniteContext.scala:143)
  at org.apache.ignite.spark.impl.IgniteAbstractRDD.ensureCache(IgniteAbstractRDD.scala:39)
  at org.apache.ignite.spark.IgniteRDD.compute(IgniteRDD.scala:59)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
  at org.apache.spark.scheduler.Task.run(Task.scala:99)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案