使用spark将大量插入Cassandra UDT

时间:2017-01-26 12:22:48

标签: java apache-spark cassandra spark-cassandra-connector

我正在使用spark批量插入Facebook数据进行分析我已经评论作为cassandra中的UDT。一个表 fbpost ,其中包含 comments 列 下面是架构

CREATE TYPE analytics.comment (
commentid text,
commenttext varchar,
username text,
commentdatetime timestamp );

CREATE TABLE analytics.posts (
postid text,
username text,
comments set<frozen<comment>>,
datetime timestamp,
posttext varchar,
PRIMARY KEY (datetime, username) 
来自用Java编写的spark作业的

我正在创建一个JavaRDD(类型为FBposts),我需要将其保存在Cassandra表中。我有FBpost和评论的pojos。

 javaFunctions(fbPostFromMysql).writerBuilder("analytics", "posts", fbPostWriter).saveToCassandra();

可能是我没有正确地将我的Java注释pojo的映射到UDTValue 这就是为什么我得到下面的TypeConversionException是stacktrace

localhoatastax.spark.connector.types.TypeConversionException: Cannot convert object Comment [commentId=642528915901340_642701759217389, commenttext=Prix, username=Angel Nannousa, commentdatetime=2016-07-0 
class com.prophecy.spark.cassandra.model.Comment to com.datastax.driver.core.UDTValue

这是rowwriter的

的java代码
public class FbPostRowWriter implements RowWriter<FbPost> {
private static final long serialVersionUID = 1L;
private static RowWriter<FbPost> writer = new FbPostRowWriter();

// Factory
public static class FbPostRowWriterFactory implements RowWriterFactory<FbPost>, Serializable{
    private static final long serialVersionUID = 1L;

    @Override
    public RowWriter<FbPost> rowWriter(TableDef tableDef, IndexedSeq<ColumnRef> arg1) {
        return writer;
    }
}

@Override
public Seq<String> columnNames() {      
    return scala.collection.JavaConversions.asScalaBuffer(FbPost.columns()).toList();
}

@Override
public void readColumnValues(FbPost summary, Object[] buffer) {
    buffer[0] = summary.getPostId();
    buffer[1] = summary.getUsername();
    buffer[3] = summary.getDatetime();
    buffer[4] = summary.getPosttext();  
    buffer[2] = summary.getComments();
    }
}

请建议如何映射注释集以便能够在UDT列中插入数据

0 个答案:

没有答案