Am new to the Big data environment and just started with installing a 3 Node Hadoop cluster 2.6 with HA Capability using Zookeeper.
All works good for now and i have tested the Failover scenario using zookeeper on NN1 and NN2 and works well.
Now i was thinking to install Apache Spark on my Hadoop Yarn cluster also with HA Capability.
Can anyone guide me with the installation steps ? I could only find on how to setup Spark on Stand alone mode and which i have setup successfully. Now i want to install the same in Yarn cluster along with HA Capability ,
I have three node cluster (NN1 , NN2 , DN1) , the following daemons are currently running on each of these servers ,
Nodes running in Master NameNode (NN1)
Jps
DataNode
DFSZKFailoverController
JournalNode
ResourceManager
NameNode
QuorumPeerMain
NodeManager
Nodes running in StandBy NameNode (NN2)
Jps
DFSZKFailoverController
NameNode
QuorumPeerMain
NodeManager
JournalNode
DataNode
Nodes running in DataNode (DN1)
QuorumPeerMain
Jps
DataNode
JournalNode
NodeManager
答案 0 :(得分:0)
您应该设置ResourceManager HA(http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html)。在YARN上运行时,Spark不运行自己的守护程序进程,因此在YARN模式下没有需要HA的spark部分。
答案 1 :(得分:0)
您可以配置Spark Yarn模式,在Yarn模式下,您可以根据群集容量配置驱动程序和执行程序。
spark.executor.memory <value>
根据您的YARN Container内存分配执行程序数!