读取流时使用withColumn()遇到问题

时间:2019-08-29 21:52:15

标签: apache-spark pyspark

我正在尝试使用以下代码读取带有spark的流数据:

eventsDF = (
  spark
    .readStream
    .schema(schema)
    .option("header", "true")
    .option("maxFilesPerTrigger", 1)
    .withColumn("time", unix_timestamp("time")  
    .cast("double")
    .cast("timestamp"))
    .csv(inputPath)
)

但是我得到了错误:

'DataStreamReader' object has no attribute 'withColumn'

spark.readStream()中是否有 withColumn()的替代方法?我只想将时间列的列类型从字符串更改为时间戳。

1 个答案:

答案 0 :(得分:0)

创建数据框后,请尝试移动.withColumn-在.csv之后


eventsDF = (
  spark
    .readStream
    .schema(schema)
    .option("header", "true")
    .option("maxFilesPerTrigger", 1)
    .csv(inputPath)
    .withColumn("time", unix_timestamp().cast("double").cast("timestamp"))
)