Question

我已经编写了Python代码，以便从Azure Data Lake中读取csv文件：

    invoiceDf = spark.read.format("csv").option("header","true").option("InferSchema","true").option("delimiter", "|").load(invoicePath)

invoiceDf.createOrReplaceTempView("TempInvoice")

但是，如果文件未到达或无法找到会怎样？

在Databricks文档中，我发现了这一点：“处理不良记录和文件”，其中他们用try /除外解释了选项“（（“ badRecordsPath”，“ / tmp / badRecordsPath”）“）

from pyspark.sql import * 
        from pyspark.conf import SparkConf
        SparkSession.builder.config(conf=SparkConf())
        try:
            invoiceDf = spark.read.format("csv").option("header","true").option("InferSchema","true").option("delimiter", "|").option("badRecordsPath", "/tmp/badRecordsPath").load(invoicePath)
            invoiceDf.createOrReplaceTempView("TempInvoice")
        except:
             print("file not found")

但是，我不想运行笔记本中的其余单元，而是跳过它们。

我该如何实现？我是dabricks和Python的新手，但我不知道GOTO和RETURN都不是替代品吗？

如果使用选项“ badRecordsPath”，如何跳过databricks笔记本中的单元格

0 个答案: