我设置了一个spark-yarn集群环境,并尝试使用spark-shell的spark-SQL:
spark-shell --master yarn --deploy-mode client --conf spark.yarn.archive=hdfs://hadoop_273_namenode_ip:namenode_port/spark-archive.zip
值得一提的是Spark是在Windows 7中。在spark-shell成功启动后,我执行如下命令:
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df_mysql_address = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://mysql_db_ip/db").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "ADDRESS").option("user", "root").option("password", "root").load()
scala> df_mysql_address.show
scala> df_mysql_address.write.format("parquet").saveAsTable("address_local")
"显示"命令正确返回结果集,但" saveAsTable"以失败告终。错误消息显示:
java.io.IOException: Mkdirs failed to create file:/C:/jshen.workspace/programs/spark-2.2.0-bin-hadoop2.7/spark-warehouse/address_local/_temporary/0/_temporary/attempt_20171018104423_0001_m_000000_0 (exists=false, cwd=file:/tmp/hadoop/nm-local-dir/usercache/hduser/appcache/application_1508319604173_0005/container_1508319604173_0005_01_000003)
我希望并猜测该表将保存在hadoop集群中,但你可以看到dir(C:/jshen.workspace/programs/spark-2.2.0-bin-hadoop2.7/spark-warehouse )是我的Windows 7中的文件夹,而不是hdfs中的文件夹,甚至不在hadoop ubuntu机器中。
我该怎么办?请指教,谢谢。
答案 0 :(得分:0)
解决问题的方法是在“保存”操作之前提供“路径”选项,如下所示:
scala> df_mysql_address.write.option("path", "/spark-warehouse").format("parquet").saveAsTable("address_local")
谢谢@philantrovert。