SnappyData:将独立Spark作业连接到嵌入式群集

时间:2017-10-25 21:53:37

标签: apache-spark intellij-idea snappydata

我想要实现的是类似于智能连接器模式,但文档对我没什么帮助,因为Smart Connector示例基于Spark-Shell,而我正在尝试运行独立的Scala应用程序。因此,我不能将-conf参数用于Spark-Shell。

试图找到我的火花大师,我查看了SnappyData网页界面。我找到了以下内容:

host-data="false"
 locators="xxx.xxx.xxx.xxx:10334"
 log-file="snappyleader.log"
 mcast-port="0"
 member-timeout="30000"
 persist-dd="false"
 route-query="false"
 server-groups="IMPLICIT_LEADER_SERVERGROUP"
 snappydata.embedded="true"
 spark.app.name="SnappyData"
 spark.closure.serializer="org.apache.spark.serializer.PooledKryoSerializer"
 spark.driver.host="xxx.xxx.xxx.xxx"
 spark.driver.port="37838"
 spark.executor.id="driver"
 spark.local.dir="/var/opt/snappydata/lead1/scratch"
 spark.master="snappydata://xxx.xxx.xxx.xxx:10334"
 spark.memory.manager="org.apache.spark.memory.SnappyUnifiedMemoryManager"
 spark.memory.storageFraction="0.5"
 spark.scheduler.mode="FAIR"
 spark.serializer="org.apache.spark.serializer.PooledKryoSerializer"
 spark.ui.port="5050"
 statistic-archive-file="snappyleader.gfs"
--- end --

(目前,IP地址都在一台主机上。)

我有一个简单的示例Spark作业,只是为了测试让我的集群工作:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SnappySession
import org.apache.spark.sql.Dataset

object snappytest{
  case class Person(name: String, age: Long)

  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
      .builder()
      .appName("SnappyTest")
      .master("snappydata://xxx.xxx.xxx.xxx:10334")
      .getOrCreate()
    val snappy = new SnappySession(spark.sparkContext)

    import spark.implicits._

    val caseClassDS = Seq(Person("Andy", 35)).toDS()
    println(Person)
    println(snappy)
    println(spark)
  }
}

我收到了这个错误:

17/10/25 14:44:57 INFO ServerConnector: Started Spark@ffaaaf0{HTTP/1.1}{0.0.0.0:4040}
17/10/25 14:44:57 INFO Server: Started @2743ms
17/10/25 14:44:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/10/25 14:44:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx.xxx.xxx.xxx:4040
17/10/25 14:44:57 INFO SnappyEmbeddedModeClusterManager: setting from url snappydata.store.locators with xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: cluster configuration after overriding certain properties 
jobserver.enabled=false
snappydata.embedded=true
snappydata.store.host-data=false
snappydata.store.locators=xxx.xxx.xxx.xxx:10334
snappydata.store.persist-dd=false
snappydata.store.server-groups=IMPLICIT_LEADER_SERVERGROUP
spark.app.name=SnappyTest
spark.driver.host=xxx.xxx.xxx.xxx
spark.driver.port=35602
spark.executor.id=driver
spark.master=snappydata://xxx.xxx.xxx.xxx:10334
17/10/25 14:44:58 INFO LeadImpl: passing store properties as {spark.driver.host=xxx.xxx.xxx.xxx, snappydata.embedded=true, spark.executor.id=driver, persist-dd=false, spark.app.name=SnappyTest, spark.driver.port=35602, spark.master=snappydata://xxx.xxx.xxx.xxx:10334, member-timeout=30000, host-data=false, default-startup-recovery-delay=120000, server-groups=IMPLICIT_LEADER_SERVERGROUP, locators=xxx.xxx.xxx.xxx:10334}
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
NanoTimer::Problem loading library from URL path: /home/jpride/.ivy2/cache/io.snappydata/gemfire-core/jars/libgemfirexd64.so: java.lang.UnsatisfiedLinkError: no gemfirexd64 in java.library.path
Exception in thread "main" org.apache.spark.SparkException: Primary Lead node (Spark Driver) is already running in the system. You may use smart connector mode to connect to SnappyData cluster.

那么在这种情况下我该如何(我应该?)使用智能连接器模式?

1 个答案:

答案 0 :(得分:2)

您需要在示例spark job中指定以下内容 -

.master("local[*]")
.config("snappydata.connection", "xxx.xxx.xxx.xxx:1527")