Question

我有以下示例代码，其中输入是来自azure事件中心的流数据源，并且我正在Databricks中运行此作业：

val input = spark.readStream
      .schema(schema)
      .option("sep", ",")
      .option("header", "true")
      .format("csv")
      .load("/some/sample/path")

// added some new features/columns
val res1 = input.withColumn("col1", lit("")).withColumn("col2", lit(""))

spark.sparkContext.setLocalProperty("spark.scheduler.pool", "cook")
res1.writeStream
      .option("checkpointLocation", "/some/path")
      .queryName("cook")
      .format("avro")
      .option("path",  "/some/cooked/data/path")
      .start()

// added some more features/columns
val res2 = res1.withColumn("col3", lit("")).withColumn("col4", lit(""))

spark.sparkContext.setLocalProperty("spark.scheduler.pool", "applicationMain")
res2.writeStream
      .option("checkpointLocation", "/some/path2")
      .queryName("applicationMain")
      .format("avro")
      .option("path",  "/some/other/data/path2")
      .start()

我的fairscheduler.xml是：

<?xml version="1.0"?>
<allocations>
<pool name="cook">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
</pool>
<pool name="applicationMain">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
</pool>
</allocations>

并且我在spark-submit中也有以下设置：

"--conf","spark.scheduler.mode=FAIR",
"--conf","spark.scheduler.allocation.file=/dbfs/fairscheduler.xml"

在Spark UI中，工作似乎做得很好：

问题： cook查询似乎正确地写入了数据，但是applicationMain查询似乎没有被执行，没有日志/错误/警告消息，尽管UI似乎表明它已执行，但是写入目录为空。

有人遇到过类似的问题吗？知道我可能会缺少什么吗？

谢谢！

在权重相等的Databricks中的1个作业中运行2个流式查询的问题

0 个答案: