Question

我在Spark 2.0中读取CSV文件，并使用以下内容计算列中的空值：

val df = spark.read.option("header", "true").csv(dir)

df.filter("IncidntNum is not null").count()

当我使用spark-shell测试它时，它工作正常。当我创建一个包含代码的jar文件并将其提交给spark-submit时，我在上面的第二行得到一个例外：

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '' expecting {'(', 'SELECT', ..
== SQL ==
IncidntNum is not null
^^^

        at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:197)

当我使用在spark-shell中运行的代码时，知道为什么会发生这种情况吗？

Answer 1

这个问题已经存在了一段时间，但迟到总比没有好。

我能想到的最可能的原因是，当使用spark-submit运行时，你正在＆＃34; cluster＆＃34;模式。这意味着驱动程序进程将位于与运行spark-shell时不同的计算机上。这可能导致Spark读取不同的文件。

在Spark中的数据框中选择非空值

1 个答案: