将时间间隔添加到数据帧中的列中

时间:2019-07-18 13:18:01

标签: scala apache-spark

下面是我的数据框。

    import spark.implicits._
    val lastRunDtDF = sc.parallelize(Seq(
                                (1, 2,"2019-07-18 13:34:24") 
                                )).toDF("id", "cnt","run_date")

    lastRunDtDF.show

    +---+---+-------------------+
    | id|cnt|           run_date|
    +---+---+-------------------+
    |  1|  2|2019-07-18 13:34:24|
    +---+---+-------------------+

我想通过将2分钟添加到现有的run_date列中来创建一个新的数据框,并将新列作为new_run_date。示例输出如下。

    +---+---+-------------------+-------------------+
    | id|cnt|           run_date|       new_run_date|
    +---+---+-------------------+-------------------+
    |  1|  2|2019-07-18 13:34:24|2019-07-18 13:36:24|
    +---+---+-------------------+-------------------+

我正在尝试类似以下的内容

  lastRunDtDF.withColumn("new_run_date",lastRunDtDF("run_date")+"INTERVAL 2 MINUTE")

看起来不正确。预先感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

尝试在 INTERVAL 2 MINUTE 函数中包装expr

import org.apache.spark.sql.functions.expr
lastRunDtDF.withColumn("new_run_date",lastRunDtDF("run_date") + expr("INTERVAL 2 MINUTE"))
           .show()

结果:

+---+---+-------------------+-------------------+
| id|cnt|           run_date|       new_run_date|
+---+---+-------------------+-------------------+
|  1|  2|2019-07-18 13:34:24|2019-07-18 13:36:24|
+---+---+-------------------+-------------------+

(或)

使用from_unixtime,unix_timestamp函数:

import org.apache.spark.sql.functions._

lastRunDtDF.selectExpr("*","from_unixtime(unix_timestamp(run_date) + 2*60,
                            'yyyy-MM-dd HH:mm:ss') as new_run_date")
           .show()

结果:

+---+---+-------------------+-------------------+
| id|cnt|           run_date|       new_run_date|
+---+---+-------------------+-------------------+
|  1|  2|2019-07-18 13:34:24|2019-07-18 13:36:24|
+---+---+-------------------+-------------------+