我尝试使用spark-shell连接hdfs
我使用spark 2.4.3,scala 2.11.12,Hadoop 3.1.2
spark-shell中的代码
scala> val rdd = sc.textFile("hdfs://localhost:8020/tmp/1.json")
hadoop的core-site.xml文件中的配置
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
我也可以在shell中获取文件
gary@localhost hadoop fs -cat "hdfs://localhost:8020/tmp/1.json"
2019-09-12 10:19:45,021 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{"app":"优土","mac":"C3E302F6B285DA6BA955AA4996E9E568"}%
但在spark-shell中失败
scala> val rdd = sc.textFile("hdfs://localhost:8020/tmp/1.json")
rdd: org.apache.spark.rdd.RDD[String] = hdfs://localhost:8020/tmp/1.json MapPartitionsRDD[1] at textFile at <console>:24
scala> rdd.count()
java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused