我的代码是:
val df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", header)
.load("/input/du3_init.dat")
val dfCI2 = df.select("CI2")
dfCI2.printSchema()
val path="hdfs://nameservice/user/CI2_Schema"
new PrintWriter(path) { write(dfCI2.schema.treeString);close}
当我执行spark时,我正在
Exception in thread "main" java.io.FileNotFoundException: hdfs:/nameservice/user/CI2_Schema (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
异常中显示的hdfs路径中只存在一个斜杠。怎么解决这个?提前致谢
答案 0 :(得分:1)
如果您要写信至hdfs
,则无法使用PrintWriter
。 PrintWriter不应该理解网络路径,例如hdfs://
或ftp://
的路径。它适用于本地文件系统。
您可以通过获取hdfs
配置表单spark context。
hdfs
import org.apache.hadoop.fs.FileSystem
import java.io.BufferedOutputStream
val hdfsConf = sparkContext.hadoopConfiguration
val fileSystem: FileSystem = FileSystem.get(hdfsConf)
val filePath = "hdfs://nameservice1/user/dhdpbankcrtbtch/CIW2_Schema"
val hdfsFileOS: FSDataOutputStream = fileSystem.create(new Path(filePath));
// create a buffered output stream using the FSDataOutputStream
val bos = new BufferedOutputStream(hdfsFileOS)
bos.write(dfCIW2.schema.treeString.toBytes("utf-8"))
bos.close()