我正在尝试使用以下代码读取带有spark的流数据:
eventsDF = (
spark
.readStream
.schema(schema)
.option("header", "true")
.option("maxFilesPerTrigger", 1)
.withColumn("time", unix_timestamp("time")
.cast("double")
.cast("timestamp"))
.csv(inputPath)
)
但是我得到了错误:
'DataStreamReader' object has no attribute 'withColumn'
spark.readStream()中是否有 withColumn()的替代方法?我只想将时间列的列类型从字符串更改为时间戳。
答案 0 :(得分:0)
创建数据框后,请尝试移动.withColumn
-在.csv
之后
eventsDF = (
spark
.readStream
.schema(schema)
.option("header", "true")
.option("maxFilesPerTrigger", 1)
.csv(inputPath)
.withColumn("time", unix_timestamp().cast("double").cast("timestamp"))
)