我有一个pyspark数据框df
+------------+------+
| timestamp | days |
+------------+------+
| 2019-11-21 | 5 |
| 2019-10-22 | 21 |
| ... | ... |
+------------+------+
我想从时间戳中减去天数
import pyspark.sql.functions as F
df.withColumn("timestamp", F.date_add(F.col("timestamp"), -F.col("days")))
预期结果将是
+------------+------+
| timestamp | days |
+------------+------+
| 2019-11-16 | 5 |
| 2019-10-01 | 21 |
| ... | ... |
+------------+------+
但是我只得到一个错误TypeError: Column is not iterable
是否有办法使它正常工作?
答案 0 :(得分:0)
使用udf是解决方案。
date_add_udf = F.udf(lambda date, days: F.date_add(date, days), pyspark.sql.types.TimestampType())
然后调用它
df.withColumn("timestamp", date_add_udf(F.col("timestamp"), -F.col("days")))