将Kafka主题中的数据保存到Cassandra

时间:2016-02-16 15:17:46

标签: cassandra apache-kafka spark-streaming spark-cassandra-connector

我正在学习Spark流媒体,并尝试使用spark-streaming和Cassandra Spark连接器将从Kafka主题收到的样本库存数据(只是字符串,如“MSFT:28.29”)保存到Cassandra。

如果不保存到Cassandra,我的代码工作正常(从Kafka获取数据并进行一些简单的统计计算)。 Cassandra已配置并且已建立连接。

但是如果我想在处理之前添加以下行来将原始数据保存到Cassandra表:

 stockParsed.saveToCassandra("dashboard","raw_tick")

在Spark流媒体用户界面中,我看到1个批处理挂起处于“处理”状态,所有其余处于状态“已排队”状态,并且没有任何数据存在于Cassandra中。

在Spark控制台中,我只看到如下行:

16/02/16 10:18:40 INFO JobScheduler: Added jobs for time 1455635920000 ms
16/02/16 10:18:50 INFO JobScheduler: Added jobs for time 1455635930000 ms
16/02/16 10:19:00 INFO JobScheduler: Added jobs for time 1455635940000 ms

这是我的代码:

case class Stock(ticker: String, price: Double)
// ....

val conf = new SparkConf().setAppName("KafkaStream").setMaster("local[*]")
  .set("spark.cassandra.connection.host", "localhost")
  .set("spark.cassandra.auth.username", "cassandra")
  .set("spark.cassandra.auth.password", "cassandra")
  .set("spark.cassandra.connection.keep_alive_ms","60000")
  .set("spark.cassandra.input.split.size_in_mb","1")

val ssc = new StreamingContext(conf, Seconds(10))

val topicMap = Map("test" -> 1)

val lines = KafkaUtils.createStream(ssc, "localhost:2181", "test-group", topicMap).map(_._2)

val stockParsed = lines.map(line => line.split(':')).map(s => Stock(s(0).toString, s(1).toDouble))

//Problem here
stockParsed.saveToCassandra("dashboard","raw_tick",SomeColumns("ticker", "price"))

//Some processing below

我的build.sbt:

import sbt.Keys._

name := "KafkaStreamSbt"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"  % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-assembly" % "1.6.0"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector-java" % "1.5.0-RC1"
libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.16"

任何想法如何解决?

1 个答案:

答案 0 :(得分:0)

问题解决了:我在Cassandra键空间配置中出错了。使用此脚本重新创建密钥空间后:

CREATE KEYSPACE tutorial WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

代码工作正常。