spark setCassandraConf无法正常工作

时间:2018-10-18 11:50:35

标签: apache-spark datastax-enterprise cassandra-3.0 databricks

我正在使用.setCassandraConf(c_options_conf)设置sparkSession以连接cassandra集群,如下所示。

工作正常:

 val spark = SparkSession
      .builder()
      .appName("DatabaseMigrationUtility")
      .config("spark.master",devProps.getString("deploymentMaster"))
      .getOrCreate()
                .setCassandraConf(c_options_conf)

如果我使用如下所示的dataframe writer对象保存表,则它指向已配置的集群,并按如下所示完美地保存在Cassandra中

 writeDfToCassandra(o_vals_df, key_space , "model_vals"); //working fine using o_vals_df.

但是如果如下所述,它指向的是localhost而不是cassandra集群,并且无法保存。

不起作用:

import spark.implicits._
val sc = spark.sparkContext

val audit_df = sc.parallelize(Seq(LogCaseClass(columnFamilyName, status,
      error_msg,currentDate,currentTimeStamp, updated_user))).saveToCassandra(keyspace, columnFamilyName);

尝试连接本地主机时抛出错误。

错误:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.TransportException:
[localhost/127.0.0.1:9042] Cannot connect))
            at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:233)

这是怎么了?为什么即使将sparkSession设置为cassandra集群并且更早的方法也能正常工作,但它仍指向默认的localhost。

2 个答案:

答案 0 :(得分:1)

我们需要使用SparkSession的两个设置方法(即.config(conf).setCassandraConf(c_options_conf))来设置配置,其值与下图相同

  val spark = SparkSession
        .builder()
        .appName("DatabaseMigrationUtility")
        .config("spark.master",devProps.getString("deploymentMaster"))
        .config("spark.dynamicAllocation.enabled",devProps.getString("spark.dynamicAllocation.enabled"))
        .config("spark.executor.memory",devProps.getString("spark.executor.memory"))
        .config("spark.executor.cores",devProps.getString("spark.executor.cores"))
        .config("spark.executor.instances",devProps.getString("spark.executor.instances"))
        .config(conf)

        .getOrCreate()
        .setCassandraConf(c_options_conf)

然后我将为cassandra最新的api以及RDD / DF Api工作。

答案 1 :(得分:0)

通过spark.cassandra.connection.host Spark属性(而不是通过setCassandraConf设置IP)适用于RDD和DataFrame。可以在提交作业时从命令行设置此属性,也可以显式地(例如从文档中获取):

val conf = new SparkConf(true)
    .set("spark.cassandra.connection.host", "192.168.123.10")
    .set("spark.cassandra.auth.username", "cassandra")            
    .set("spark.cassandra.auth.password", "cassandra")

val sc = new SparkContext("spark://192.168.123.10:7077", "test", conf)

看看documentation for connector,包括有关现有configuration properties的参考。