处理Spark结构化流之后,将生成空CSV文件

时间:2019-01-10 23:38:24

标签: apache-spark spark-structured-streaming

当我尝试将一些Spark结构化的流数据写入CSV时,我看到在HDFS位置生成了空的零件文件。我试图在控制台上写同样的东西,并将数据生成到控制台。

   val spark =SparkSession.builder().appName("micro").
    enableHiveSupport().config("hive.exec.dynamic.partition", "true").
    config("hive.exec.dynamic.partition.mode", "nonstrict").
    config("spark.sql.streaming.checkpointLocation", "/user/sasidhr1/sparkCheckpoint").
    config("spark.debug.maxToStringFields",100).
    getOrCreate()

    val mySchema = StructType(Array(
     StructField("id", IntegerType),
     StructField("name", StringType),
     StructField("year", IntegerType),
     StructField("rating", DoubleType),
     StructField("duration", IntegerType)
    ))

    val xmlData = spark.readStream.option("sep", ",").schema(mySchema).csv("file:///home/sa1/kafdata/") 
    import java.util.Calendar
    val df_agg_without_time= xmlData.withColumn("event_time", to_utc_timestamp(current_timestamp, Calendar.getInstance().getTimeZone().getID()))

    val df_agg_with_time = df_agg_without_time.withWatermark("event_time", "10 seconds").groupBy(window($"event_time", "10 seconds", "5 seconds"),$"year").agg(sum($"rating").as("rating"),sum($"duration").as("duration"))

   val pr = df_agg_with_time.drop("window")

    pr.writeStream.outputMode("append").format("csv").
    option("path", "hdfs://ccc/apps/hive/warehouse/rta.db/sample_movcsv/").start()

如果我不删除(窗口)列,则会发生另一个问题...该问题我已经在此处发布了。How to write windowed aggregation in CSV format?

有人可以帮忙吗?聚合后如何将csv文件写入hdfs ..请帮助

0 个答案:

没有答案