在Java中使用Spark数据帧在db中持久化窗口函数的输出

时间:2018-08-03 10:50:36

标签: java scala apache-spark pyspark apache-spark-sql

执行以下代码段时:

Dataset<Row> ds1=ds.groupBy(functions.window(ds.col("datetime"),windowLength,slidingLength).as("datetime"),ds.col("symbol").as("Ticker"))
        .agg(functions.mean("volume").as("volume"),functions.mean("price").as("Price"),
        (functions.first("price").plus(functions.last("price")).divide(value)).as("Mid_Point"),
        functions.max("price").as("High"),functions.min("price").as("Low"),
        functions.first("price").as("Open"),functions.last("price").as("Close"))
        .sort(functions.asc("datetime"));

ds1.printSchema();

输出:

|-- datetime: struct (nullable = true)
 |    |-- start: timestamp (nullable = true)
 |    |-- end: timestamp (nullable = true)
 |-- Ticker: string (nullable = true)
 |-- volume: double (nullable = true)
 |-- Price: double (nullable = true)
 |-- Mid_Point: double (nullable = true)
 |-- High: double (nullable = true)
 |-- Low: double (nullable = true)
 |-- Open: double (nullable = true)
 |-- Close: double (nullable = true)

现在,当我尝试将其保存到csv文件中时,出现了csv文件无法将日期时间解析为时间戳的错误。

错误:

cannot resolve 'CAST(`datetime` AS TIMESTAMP)' due to data type mismatch: cannot cast StructType(StructField(start,TimestampType,true), StructField(end,TimestampType,true)) to TimestampType

有人对此有任何想法吗?

1 个答案:

答案 0 :(得分:0)

将日期时间转换应用于列而不是将其应用于滑动窗口,

ds.col("datetime").as("datetime")