我正在使用Ignite 2.3,尝试使用Spark将数据从hdfs实木复合地板文件加载到Ignite缓存。数据已加载,但无法保存到Ignite Cache。
如果我们使用本地模式而不是spark YARN模式,它将正常工作。问题将是,为什么在YARN模式下,将计算分配到群集,为什么Ignite仍需要Ignite_Home?如何解决呢?在YARN模式下有任何可行的示例吗?
我的环境是JDK 8,Ignite 2.3,Spark 2.1.0.cloudera1
我的火花壳是:
env JAVA_HOME="/usr/java/jdk1.8.0_181-amd64" spark2-shell \
--master yarn \
--deploy-mode client \
--driver-memory 12g \
--conf spark.driver.extraJavaOptions="-DmavenTest=true" \
--driver-class-path ~/libs/resources \
--jars my.jar
数据已通过Spark加载,但无法保存到Ignite Cache。
val myDataRDD = ... //Spark RDD
val igniteCluster = "10.xxx.xxx.xxx:64789"
val igniteContext = new IgniteContext(spark.sparkContext, () => new IgniteConfiguration()
.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(new TcpDiscoveryVmIpFinder().setAddresses(List(igniteCluster).asJava)))
.setWorkDirectory("/tmp/service")
.setClientMode(true)
.setDeploymentMode(DeploymentMode.CONTINUOUS)
.setPeerClassLoadingEnabled(true)
)
val cfg: CacheConfiguration[MyDataKey, MyData] = new CacheConfiguration("cacheName");
cfg.setIndexedTypes(MyDataKey, MyData);
cfg.setCacheMode(PARTITIONED);
cfg.setAtomicityMode(ATOMIC);
cfg.setStatisticsEnabled(true);
cfg.setBackups(1);
cfg.setRebalanceMode(ASYNC);
cfg.setRebalanceBatchSize(2 * 1024 * 1024); //2MB
cfg.setRebalanceBatchesPrefetchCount(8);
val cacheRDD = igniteContext.fromCache[MyDataKey, MyData](cfg)
cacheRDD.savePairs(myDataRDD, true) //Failed
错误消息:
Caused by: org.apache.ignite.IgniteException: Invalid Ignite installation home folder: /myproject/ROOT
at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:966)
at org.apache.ignite.Ignition.getOrStart(Ignition.java:417)
at org.apache.ignite.spark.IgniteContext.ignite(IgniteContext.scala:143)
at org.apache.ignite.spark.impl.IgniteAbstractRDD.ensureCache(IgniteAbstractRDD.scala:39)
at org.apache.ignite.spark.IgniteRDD.compute(IgniteRDD.scala:59)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)