Question

我正在尝试使用以下代码读取带有spark的流数据：

eventsDF = (
  spark
    .readStream
    .schema(schema)
    .option("header", "true")
    .option("maxFilesPerTrigger", 1)
    .withColumn("time", unix_timestamp("time")  
    .cast("double")
    .cast("timestamp"))
    .csv(inputPath)
)

但是我得到了错误：

'DataStreamReader' object has no attribute 'withColumn'

spark.readStream（）中是否有 withColumn（）的替代方法？我只想将时间列的列类型从字符串更改为时间戳。

Answer 1

创建数据框后，请尝试移动.withColumn-在.csv之后


eventsDF = (
  spark
    .readStream
    .schema(schema)
    .option("header", "true")
    .option("maxFilesPerTrigger", 1)
    .csv(inputPath)
    .withColumn("time", unix_timestamp().cast("double").cast("timestamp"))
)

读取流时使用withColumn（）遇到问题

1 个答案: