Question

在我的Spark应用程序中，我不得不将时间和数据分开并将它们存储在单独的列中，如下所示：

val df5=df4.withColumn("read_date",date_format(df4.col("date"), "yyyy-MM-dd")).withColumn("read_time",date_format(df4.col("date"), "HH:mm:ss")).drop("date")

此命令将拆分数据和时间

------------+-------------
2012-01-12     00:06:00
------------+-------------

但是将两个字段都创建为String。因此，我必须.cast("date")来确定日期，但是用于时间列的数据类型是什么？如果我像.cast("timestamp")这样使用，它将把当前服务器的日期和时间结合起来。在我们将Power BI中的数据可视化时，您认为将时间存储为String是正确的做法吗？

Answer 1

Spark中没有DataType来保存'HH：mm：ss'值。相反，您可以使用hour（），minute（）和second（）函数分别表示值。

所有这些函数都返回 int 类型。

hour(string date) -- Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.

minute(string date) -- Returns the minute of the timestamp.

second(string date) -- Returns the second of the timestamp.

时间列应使用哪种数据类型

1 个答案: