我有一个jsonl文件我已经读入,创建了一个临时表视图并过滤了我想要修改的记录。
val df = session.read.json("tiny.jsonl")
df.createOrReplaceTempView("tempTable")
val filter = df.select("*").where("field IS NOT NULL")
现在,我正处于尝试各种事物的部分。我想更改名为" time"的列。在我写回之前使用currentTimestamp。有时我会想将currentTimestamp更改为timestampNow - 例如5天。
val change = test.withColumn("server_time", date_add(current_timestamp(), -1))
上面的例子会让我回到今天的1,而不是时间戳。
修改: 模拟我的jsonl输入的示例数据帧:
val df = Seq(
(1, "fn", "2018-02-18T22:18:28.645Z"),
(2, "fu", "2018-02-18T22:18:28.645Z"),
(3, null, "2018-02-18T22:18:28.645Z")
).toDF("id", "field", "time")
预期产出:
+---+------+-------------------------+
| id|field |time |
+---+------+-------------------------+
| 1| fn | 2018-04-09T22:18:28.645Z|
| 2| fn | 2018-04-09T22:18:28.645Z|
+---+------+-------------------------+
答案 0 :(得分:1)
如果您想用当前column
替换当前time
timestamp
,则可以使用current_timestamp
功能。要添加可以使用SQL INTERVAL
val df = Seq(
(1, "fn", "2018-02-18T22:18:28.645Z"),
(2, "fu", "2018-02-18T22:18:28.645Z"),
(3, null, "2018-02-18T22:18:28.645Z")
).toDF("id", "field", "time")
.na.drop()
val ddf = df
.withColumn("time", current_timestamp())
.withColumn("newTime", $"time" + expr("INTERVAL 5 DAYS"))
输出:
+---+-----+-----------------------+-----------------------+
|id |field|time |newTime |
+---+-----+-----------------------+-----------------------+
|1 |fn |2018-04-10 15:14:27.501|2018-04-15 15:14:27.501|
|2 |fu |2018-04-10 15:14:27.501|2018-04-15 15:14:27.501|
+---+-----+-----------------------+-----------------------+