我们正在使用cloudera的发行版为hadoop。我们有一个包含10个节点的工作集群。我正在尝试使用InteliJ从远程主机连接到群集。我正在使用Scala和spark。
我通过sbt
导入了下一个库libraryDependencies += "org.scalatestplus.play" %% "scalatestplus-play" % "3.1.2" % Test
libraryDependencies += "com.h2database" % "h2" % "1.4.196"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0"
我正在尝试使用下一个代码创建SparkSession:
val spark = SparkSession
.builder()
.appName("API")
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.config("hive.metastore.uris","thrift://VMClouderaMasterDev01:9083")
.master("spark://10.150.1.22:9083")
.enableHiveSupport()
.getOrCreate()
但我收到以下错误:
[error] o.a.s.n.c.TransportResponseHandler - Still have 1 requests
outstanding when connection from /10.150.1.22:9083 is closed
[warn] o.a.s.d.c.StandaloneAppClient$ClientEndpoint - Failed to connect to
master 10.150.1.22:9083
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
......
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Connection from /10.150.1.22:9083 closed
at org.apache.spark.network.client.TransportResponseHandler.channelInact
ive(TransportResponseHandler.java:146)
说实话,我尝试连接不同的端口:8022,9023但它没有用。我看到默认端口是7077,但我没有任何进程正在监听主服务器上的端口7077.
知道我怎么能继续?如何检查主设备正在侦听这些连接类型的端口?
答案 0 :(得分:1)
如果你正在使用Hadoop集群,你就不应该有一个独立的Spark主服务器,你应该使用YARN
master("yarn")
在这种情况下,您必须从群集中导出包含yarn-site.xml副本的HADOOP_CONF_DIR
环境变量