我想使用我的spark streaming API将我的AVRO kafka流存储到文件系统,并使用以下分隔格式的scala代码,但在实现此目标时面临一些挑战
record.write.mode(SaveMode.Append).csv("/Users/Documents/kafka-poc/consumer-out/)
因为,记录(通用记录)不是DF或RDD,我不知道如何继续这个?
代码
val messages = SparkUtilsScala.createCustomDirectKafkaStreamAvro(ssc, kafkaParams, zookeeper_host, kafkaOffsetZookeeperNode, topicsSet)
val requestLines = messages.map(_._2)
requestLines.foreachRDD((rdd, time: Time) => {
rdd.foreachPartition { partitionOfRecords => {
val recordInjection = SparkUtilsJava.getRecordInjection(topicsSet.last)
for (avroLine <- partitionOfRecords) {
val record = recordInjection.invert(avroLine).get
println("Consumer output...."+record)
println("Consumer output schema...."+record.getSchema)
}}}}
以下是输出&amp;模式
{"username": "Str 1-0", "tweet": "Str 2-0", "timestamp": 0}
{"type":"record","name":"twitter_schema","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"int"}]}
提前致谢并感谢您的帮助
答案 0 :(得分:0)
我找到了解决方案。
val jsonStrings: RDD[String] = sc.parallelize(Seq(record.toString()));
val result = sqlContext.read.json(jsonStrings).toDF();
result.write.mode("Append").csv("/Users/Documents/kafka-poc/consumer-out/");