Question

我正在尝试按照以下架构从kafka

读取数据流

val schema = StructType(
      List(
        StructField("timestamp",LongType, true),
        StructField("id",StringType,true),
        StructField("value",DoubleType,true),
      )
    )

时间戳 从epoc开始持续很长时间。

我使用以下方法将长值转换为时间戳

val dfNew=df.selectExpr("CAST(value AS STRING)").as[String].select(from_json($"value",schema) as "record")
          .select($"record.id",$"record.value", col("record.timestamp").cast(TimestampType).as("timestamp"))

我希望使用窗口和水印

进行严格的流式查询测试

val output=dfNew.withWatermark("timestamp", "16 seconds").groupBy(window($"timestamp" , "10 seconds", "5 seconds"),$"uuid").count()

它的给定结果，但在Window Clumn中它显示了未来的时间戳

+--------------------------------------------------+------------------------------------+-----+
|window                                            |id                                |count|
+--------------------------------------------------+------------------------------------+-----+
|[50232-03-09 18:13:000.0, 50232-03-09 18:13:100.0]|11c7ebdb-8810-4a51-9d38-4099fd21862a|1    |
|[50232-03-09 17:49:400.0, 50232-03-09 17:49:500.0]|11c7ebdb-8810-4a51-9d38-4099fd21862a|1    |
|[50232-03-09 19:26:500.0, 50232-03-09 19:27:000.0]|58f86590-e27e-44d6-86d3-0905b126c9fd|1    |
|[50232-03-09 18:29:555.0, 50232-03-09 18:30:055.0]|11c7ebdb-8810-4a51-9d38-4099fd21862a|1    |

50232-03-09 18:13：000.0 ？

可能是什么问题，我想在我所做的kafka读取流查询中转换是错误的的 col("record.timestamp").cast(TimestampType).as("timestamp"))

。但是我找不到任何试过这个地方的地方。每个人都在尝试 from_unit_time（），但是这会给我零结果，而且分辨率也是秒，

任何解决方案？请...

Answer 1

很简单。 Spark代表秒中的解释而不是毫秒。所以只需将输入除以1000：

Seq(1523013247000L).toDF.select(
    ($"value" / 1000).cast("timestamp"), // correct
    $"value".cast("timestamp")  // Your current code
).show

// +---------------------------------+--------------------+
// |CAST((value / 1000) AS TIMESTAMP)|               value|
// +---------------------------------+--------------------+
// |              2018-04-06 13:14:07|50232-05-15 05:16...|
// +---------------------------------+--------------------+

将epoc时间以毫秒为单位转换为结构化流sql的时间戳

1 个答案: