我正在尝试从Kafka读取数据并通过Spark RDD存储到Cassandra表中。
编译代码时出错:
/root/cassandra-count/src/main/scala/KafkaSparkCassandra.scala:69: value split is not a member of (String, String)
[error] val lines = messages.flatMap(line => line.split(',')).map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
下面的代码:当我通过交互式spark-shell
手动运行代码时,它工作正常,但在编译spark-submit
错误的代码时会出现。
// Create direct kafka stream with brokers and topics
val topicsSet = Set[String] (kafka_topic)
val kafkaParams = Map[String, String]("metadata.broker.list" -> kafka_broker)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, topicsSet)
// Create the processing logic
// Get the lines, split
val lines = messages.map(line => line.split(',')).map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))
lines.saveToCassandra("stream_poc", "US_city", SomeColumns("city_name", "jan_temp", "lat", "long"))
答案 0 :(得分:3)
kafka中的所有消息都是键控的。原始Kafka流(在本例中为messages
)是元组(key,value)
的流。
正如编译错误指出的那样,元组上没有split
方法。
我们想要做的是:
messages.map{ case (key, value) => value.split(','))} ...
答案 1 :(得分:2)
KafkaUtils.createDirectStream
返回键和值的元组(因为Kafka中的消息可选择键入)。在您的情况下,它是(String, String)
类型。如果要拆分值,则必须先将其取出:
val lines =
messages
.map(line => line._2.split(','))
.map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))
或使用部分函数语法:
val lines =
messages
.map { case (_, value) => value.split(',') }
.map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))