执行以下代码段时:
...
stream
.map(_.value())
.flatMap(MyParser.parse(_))
.foreachRDD(rdd => {
val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate()
import spark.implicits._
val dataFrame = rdd.toDF();
val countsDf = dataFrame.groupBy($"action", window($"time", "1 hour")).count()
val query = countsDf.write.mode("append").jdbc(url, "stats_table", prop)
})
....
发生此错误:java.lang.IllegalArgumentException: Can't get JDBC type for struct<start:timestamp,end:timestamp>
如何将org.apache.spark.sql.functions.window()
函数的输出保存到MySQL DB?
答案 0 :(得分:1)
我使用SPARK SQL遇到了同样的问题:
val query3 = dataFrame
.groupBy(org.apache.spark.sql.functions.window($"timeStamp", "10 minutes"), $"data")
.count()
.writeStream
.outputMode(OutputMode.Complete())
.options(prop)
.option("checkpointLocation", "file:///tmp/spark-checkpoint1")
.option("table", "temp")
.format("com.here.olympus.jdbc.sink.OlympusDBSinkProvider")
.start
我通过添加用户定义函数
解决了这个问题val toString = udf{(window:GenericRowWithSchema) => window.mkString("-")}
对我来说String工作,但你可以根据需要改变功能,甚至可以有两个功能分别返回开始和结束。
我的查询已更改为:
val query3 = dataFrame
.groupBy(org.apache.spark.sql.functions.window($"timeStamp", "10 minutes"), $"data")
.count()
.withColumn("window",toString($"window"))
.writeStream
.outputMode(OutputMode.Complete())
.options(prop)
.option("checkpointLocation", "file:///tmp/spark-checkpoint1")
.option("table", "temp")
.format("com.here.olympus.jdbc.sink.OlympusDBSinkProvider")
.start