Question

在Java中，我有一个类型为java.util.ArrayList<Short>的列的Spark数据集（火花结构化流），我想将该数据集写入具有对应的list<smallint>的Cassandra表中。

每次我在Cassandra中写行时，它都会更新现有行，并且我想自定义list的写行为以控制是否

书面列表将覆盖现有列表，或
书面清单的内容将附加到到已经保存在Cassandra中的清单的内容

我在spark-cassandra-connector源代码中发现了一个类CollectionBehavior，该类由CollectionAppend和CollectionOverwrite扩展。看来确实是我要找的东西，但是我在写Cassandra时找不到使用它的方法。

使用以下命令将数据集写入Cassandra：

dataset.write()
    .format("org.apache.spark.sql.cassandra")
    .option("table", table)
    .option("keyspace", keyspace)
    .mode(SaveMode.Append)
    .save();

是否可以更改此行为？

Answer 1

要在设置集合的保存模式时保存到Cassandra集合，请使用RDD API。到目前为止，Dataset API似乎缺少此功能。因此，将数据集更改为RDD并使用RDD方法保存到cassandra应该可以为您提供所需的行为。

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md

Spark-Cassandra连接器：如何更改集合的写入行为

1 个答案: