string1 1 true
string2 2 true
string1 1 true
这个cassandraRdd包含从我的cassandra表中读取的所有列
CASSANDRA_TABLE has (some_other_column, itemid) as primary key.
val cassandraRdd: CassandraTableScanRDD[CassandraRow] = sparkSession.sparkContext
.cassandraTable(cassandraKeyspace, cassandraTable)
cassandraRdd.take(10).foreach(println)
在keyBy操作
之后,temp1和temp2都没有保留所有列val temp1: CassandraTableScanRDD[((String), CassandraRow)] = cassandraRdd
.select("itemid", "column2", "column3")
.keyBy[(String)]("itemid")
val temp2: CassandraTableScanRDD[((String), CassandraRow)] = cassandraRdd
.keyBy[(String)]("itemid")
temp1.take(10).foreach(println)
temp2.take(10).foreach(println)
如何在特定列上键入key并让CassandraRow保留所有列?
答案 0 :(得分:0)
要保留分区并获取选定的行,我必须阅读cassandra行,如下所示
val cassandraRdd: CassandraTableScanRDD[((String, String), (String, String, String))] = {
sparkSession.sparkContext
.cassandraTable[(String, String, String)](cassandraKeyspace, cassandraTable)
.select("some_other_column" as "_1", "itemid" as "_2", "column3" as "_3", "some_other_column", "itemid")
.keyBy[(String, String)]("some_other_column", "itemid")
}