Spark Cassandra Connector Java API追加/删除集合中的数据失败

时间:2018-05-30 07:11:40

标签: apache-spark collections cassandra spark-cassandra-connector java-api

我试图通过JAVA API将值附加到set类型的列。

似乎连接器忽略了我设置的CollectionBehavior的类型, 并始终覆盖以前的集合。

即使我使用CollectionRemove,要删除的值也会添加到集合中。

我正在按照以下示例进行操作:

https://datastax-oss.atlassian.net/browse/SPARKC-340?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel

我正在使用:

  • spark-core_2.11 2.2.0
  • spark-cassandra-connector_2.11 2.0.5
  • Cassandra 2.1.17

可能是这些版本不支持此功能吗?

以下是实施代码:

// CASSANDRA TABLE
CREATE TABLE test.profile (
    id text PRIMARY KEY,
    dates set<bigint>,
)

// ENTITY
public class ProfileRow {
    public static final Map<String, String> namesMap;
    static {
        namesMap = new HashMap<>();
        namesMap.put("id", "id");
        namesMap.put("dates", "dates");
    }
    private String id;
    private Set<Long> dates;
    public ProfileRow() {}
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public Set<Long> getDates() {
        return dates;
    }
    public void setDates(Set<Long> dates) {
        this.dates = dates;
    }
}


public void execute(JavaSparkContext context) {
    List<ProfileRow> elements = new LinkedList<>();
    ProfileRow profile = new ProfileRow();
    profile.setId("fGxTObQIXM");
    Set<Long> dates = new HashSet<>();
    dates.add(1l);
    profile.setDates(dates);
    elements.add(profile);
    JavaRDD<ProfileRow> rdd = context.parallelize(elements);

    RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
        .writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
    CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
    scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(appendColumn));
    SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);

    wb.withColumnSelector(columnSelector);
    wb.saveToCassandra();
}

谢谢,

1 个答案:

答案 0 :(得分:0)

我找到了答案。我必须改变两件事:

  1. 将主键列添加到列选择器。
  2. WriterBuilder.withColumnSelector()生成一个新的WriterBuilder实例,因此我必须存储新实例。
  3. RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
        .writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
    ColumnName pkColumn = new ColumnName("id", Option.empty())
    CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
    scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(pkColumn, appendColumn));
    SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);
    
    wb = wb.withColumnSelector(columnSelector);
    wb.saveToCassandra();