leftJoinWithCassandraTable使用spark-cassandra-connector

时间:2017-03-15 12:00:22

标签: apache-spark datastax-enterprise spark-cassandra-connector

如何在cassandra中实现leftJoinWithCassandraTable,使用spark-cassandra-connector。我正在使用scala 2.11 / DSE5.0.3 / Spark 1.6.2。

我猜2.0有RDDFunctions类中的方法。请让我知道你的想法。提前谢谢。

1 个答案:

答案 0 :(得分:0)

您检查了这个leftJoinWithCassandraTable吗?

  /**
    * Uses the data from [[org.apache.spark.rdd.RDD RDD]] to left join with a Cassandra table without
    * retrieving the entire table.
    * Any RDD which can be used to saveToCassandra can be used to leftJoinWithCassandra as well as any
    * RDD which only specifies the partition Key of a Cassandra Table. This method executes single
    * partition requests against the Cassandra Table and accepts the functional modifiers that a
    * normal [[com.datastax.spark.connector.rdd.CassandraTableScanRDD]] takes.
    *
    * By default this method only uses the Partition Key for joining but any combination of columns
    * which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter
    * or the on() method.
    *
    * Example With Prior Repartitioning: {{{
    * val source = sc.parallelize(keys).map(x => new KVRow(x))
    * val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10)
    * val someCass = repart.leftJoinWithCassandraTable(keyspace, tableName)
    * }}}
    *
    * Example Joining on Clustering Columns: {{{
    * val source = sc.parallelize(keys).map(x => (x, x * 100))
    * val someCass = source.leftJoinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group"))
    * }}}
**/