我有一个用Scala编写的spark作业,其中我只想写一行用逗号分隔,从Kafka制作人到Cassandra数据库。 但我无法调用saveToCassandra。 我看到几个wordcount的例子,他们正在用两列编写地图结构到Cassandra表,看起来工作正常。但我有很多专栏,我发现数据结构需要并行化。 这是我的代码示例:
object TestPushToCassandra extends SparkStreamingJob {
def validate(ssc: StreamingContext, config: Config): SparkJobValidation = SparkJobValid
def runJob(ssc: StreamingContext, config: Config): Any = {
val bp_conf=BpHooksUtils.getSparkConf()
val brokers=bp_conf.get("bp_kafka_brokers","unknown_default")
val input_topics = config.getString("topics.in").split(",").toSet
val output_topic = config.getString("topic.out")
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, input_topics)
val lines = messages.map(_._2)
val words = lines.flatMap(_.split(","))
val li = words.par
li.saveToCassandra("testspark","table1", SomeColumns("col1","col2","col3"))
li.print()
words.foreachRDD(rdd =>
rdd.foreachPartition(partition =>
partition.foreach{
case x:String=>{
val props = new HashMap[String, Object]()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer")
val outMsg=x+" from spark"
val producer = new KafkaProducer[String,String](props)
val message=new ProducerRecord[String, String](output_topic,null,outMsg)
producer.send(message)
}
}
)
)
ssc.start()
ssc.awaitTermination()
}
}
我认为Scala的语法并不正确。 提前谢谢。
答案 0 :(得分:1)
您需要将单词DStream
更改为连接器可以处理的内容。
像一个元组
val words = lines
.map(_.split(","))
.map( wordArr => (wordArr(0), wordArr(1), wordArr(2))
或案例类
case class YourRow(col1: String, col2: String, col3: String)
val words = lines
.map(_.split(","))
.map( wordArr => YourRow(wordArr(0), wordArr(1), wordArr(2)))
或CassandraRow
这是因为如果你把一个数组单独放在那里它可能是一个C *中的数组,你试图插入而不是3列。
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md