Spark结构化流文件接收器创建空的JSON文件

时间:2020-11-05 21:13:30

标签: apache-spark pyspark apache-spark-sql spark-streaming spark-structured-streaming

我正在从Kafka主题中读取数据并能够处理数据。

当我尝试以.json格式存储文件时,HDFS包含空的.json文件。

我已经使用控制台验证了输出。但是文件接收器会创建空文件。

“”“

KAFKA_CONFLUENT_TOPIC_REPLICATION_FACTOR

这是HDFS输出:

query = KPI_Final_DF \
    .writeStream \
    .outputMode("Append") \
    .format("json") \
    .option("truncate", "false") \
    .option("path","output_3") \
    .option("checkpointLocation", "output_json") \
    .trigger(processingTime="1 minute") \
    .start()

# query termination command
query.awaitTermination()
"""

Below is the console output: 
-------------------------------------------
Batch: 27
-------------------------------------------
+------------------------------------------+--------------+------------------+---+-------------------+
|window                                    |country       |Total_Volume_Sale |OPM|Rate_Return        |
+------------------------------------------+--------------+------------------+---+-------------------+
|[2020-11-05 16:30:00, 2020-11-05 16:31:00]|United Kingdom|37.010000705718994|2  |0.0                |
|[2020-11-05 16:29:00, 2020-11-05 16:30:00]|United Kingdom|613.1199990212917 |11 |0.15384615384615385|
+------------------------------------------+--------------+------------------+---+-------------------+

-------------------------------------------
Batch: 28
-------------------------------------------
+------------------------------------------+--------------+-----------------+---+-----------+
|window                                    |country       |Total_Volume_Sale|OPM|Rate_Return|
+------------------------------------------+--------------+-----------------+---+-----------+
|[2020-11-05 16:30:00, 2020-11-05 16:31:00]|United Kingdom|66.70999991893768|3  |0.0        |
+------------------------------------------+--------------+-----------------+---+-----------+

0 个答案:

没有答案