如何使用java将数据从火花流保存到cassandra?

时间:2016-05-12 10:33:26

标签: java apache-spark cassandra spark-cassandra-connector

我从linux终端的流中获取了一些条目,将它们分配为lines,将它们分成words。但是我没有将它们打印出来,而是将它们保存到Cassandra。 我有一个名为ks的Keyspace,里面有一个名为record的表。 我知道像CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();这样的代码必须完成这项工作,但我想我做错了什么。有人可以帮忙吗?

这是我的Cassandra ks.record架构(我通过CQLSH添加了这些数据)

id | birth_date                       | name
----+---------------------------------+-----------
10 | 1987-12-01 23:00:00.000000+0000  | Catherine
11 | 2004-09-07 22:00:00.000000+0000  |   Isadora
1  | 2016-05-10 13:00:04.452000+0000  |      John
2  | 2016-05-10 13:00:04.452000+0000  |      Troy
12 | 1970-10-01 23:00:00.000000+0000  |      Anna
3  | 2016-05-10 13:00:04.452000+0000  |    Andrew

这是我的Java代码:

import com.datastax.spark.connector.japi.CassandraStreamingJavaUtil;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import scala.Tuple2;

import java.util.Arrays;

import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow;
import static com.datastax.spark.connector.japi.CassandraStreamingJavaUtil.*;


public class CassandraStreaming2 {
    public static void main(String[] args) {

        // Create a local StreamingContext with two working thread and batch interval of 1 second
        SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("CassandraStreaming");
        JavaStreamingContext sc = new JavaStreamingContext(conf, Durations.seconds(1));

        // Create a DStream that will connect to hostname:port, like localhost:9999
        JavaReceiverInputDStream<String> lines = sc.socketTextStream("localhost", 9999);

        // Split each line into words
        JavaDStream<String> words = lines.flatMap(
                (FlatMapFunction<String, String>) x -> Arrays.asList(x.split(" "))
        );

        words.print();
        //CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();

        sc.start();              // Start the computation
        sc.awaitTermination();   // Wait for the computation to terminate

    }
}

1 个答案:

答案 0 :(得分:1)

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md#saving-data-to-cassandra

根据文档,您还需要传递RowWriter工厂。最常见的方法是使用mapToRow(Class) api,这是所描述的缺失参数。

但是你还有一个问题,你的代码还没有以可以写入C *的方式指定数据。您的JavaDStream只有String s。并且单个String无法用于给定模式的Cassandra行。

基本上你在告诉连接器

Write "hello" to CassandraTable (id, birthday, value)

没有告诉hello去哪里(id应该是什么?生日应该是什么?)