我在ADLS Gen2存储帐户中有一个实木复合地板文件。我想将数组中存在的所有元素分解为相应的行,然后将其写入另一个ADLS Gen2位置。 最初,我在实木复合地板文件中的数据如下所示:
val query3 = spark
.readStream
.schema(readSchema)
.format("parquet")
.load("/mnt/changefeed/EDP_schema")
df = spark.read.parquet("/mnt/changefeed/EDP_schema")
display(df)
+----+----+---------+
|col1|col2| col3|
+----+----+---------+
| 1| A|[1, 2, 3]|
| 2| B| [3, 5]|
+----+----+---------+
现在我正在转换数据。
import org.apache.spark.sql.functions.explode
val df1 = df.withColumn("col3", explode($"col3"))
display(df1)
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| A| 1|
| 1| A| 2|
| 1| A| 3|
| 2| B| 3|
| 2| B| 5|
+----+----+----+
现在,当我将其写入另一个ADLS Gen2位置时,出现以下错误
import org.apache.spark.sql.streaming.Trigger
val writeS = df2
.writeStream
.partitionBy("readS")
.outputMode("append")
.format("parquet")
.option("path", "/mnt/changefeed/EDP_Final_schema")
.option("checkpointLocation", "/checkpoint_parq_merged")
.option("overwriteSchema", true)
.queryName("merge_stream")
.trigger(Trigger.ProcessingTime(10))
.start()
org.apache.spark.sql.AnalysisException: 'writeStream' can be called only on streaming Dataset/DataFrame;