我有以下示例代码,其中输入是来自azure事件中心的流数据源,并且我正在Databricks中运行此作业:
val input = spark.readStream
.schema(schema)
.option("sep", ",")
.option("header", "true")
.format("csv")
.load("/some/sample/path")
// added some new features/columns
val res1 = input.withColumn("col1", lit("")).withColumn("col2", lit(""))
spark.sparkContext.setLocalProperty("spark.scheduler.pool", "cook")
res1.writeStream
.option("checkpointLocation", "/some/path")
.queryName("cook")
.format("avro")
.option("path", "/some/cooked/data/path")
.start()
// added some more features/columns
val res2 = res1.withColumn("col3", lit("")).withColumn("col4", lit(""))
spark.sparkContext.setLocalProperty("spark.scheduler.pool", "applicationMain")
res2.writeStream
.option("checkpointLocation", "/some/path2")
.queryName("applicationMain")
.format("avro")
.option("path", "/some/other/data/path2")
.start()
我的fairscheduler.xml是:
<?xml version="1.0"?>
<allocations>
<pool name="cook">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
<pool name="applicationMain">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
</allocations>
并且我在spark-submit中也有以下设置:
"--conf","spark.scheduler.mode=FAIR",
"--conf","spark.scheduler.allocation.file=/dbfs/fairscheduler.xml"
在Spark UI中,工作似乎做得很好:
问题: cook查询似乎正确地写入了数据,但是applicationMain查询似乎没有被执行,没有日志/错误/警告消息,尽管UI似乎表明它已执行,但是写入目录为空。
有人遇到过类似的问题吗?知道我可能会缺少什么吗?
谢谢!