我正在尝试使用spark.read.json从hdfs加载一些json文件。当我将路径硬编码为"hdfs://ha-cluster-dev/test/inputs/*"
并使用spark-submit提交jar时,它完全正常。
然后我尝试使用命令行传递此路径。我在下面写了一段代码:
object Two {
def main(args: Array[String]): Unit = {
val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://ha-cluster-dev")
val fs = FileSystem.get(new URI("hdfs://ha-cluster-dev"), conf)
val spark = SparkSession.builder().appName("Count").getOrCreate()
import spark.implicits._
val input = fs.open(new Path(args(0)))
val df = spark.read.json(input.toString())
...my operations.....
}
}
并在下面使用spark submit命令:
/home/hadoop/spark/bin/spark-submit --class com.spark.scala.Two --files hdfs://ha-cluster-dev/test/inputs/* --master yarn --deploy-mode client Spark-Scala-0.0.1.jar.
它给出了:
17/07/16 09:09:30 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File does not exist: hdfs://ha-cluster-dev/test/inputs/*
at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:134)
at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467)
at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2193)
....
....
....
尝试hdfs://
,没有hdfs://
,没有--files
,但没有任何效果。我在哪里做错了?