按照以下说明进行操作:https://www.linode.com/docs/databases/hadoop/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/我设置了一个3节点群集,并且能够运行spark-shell。但是当我尝试运行pyspark时,我收到了以下消息:
hadoop@master:~$ pyspark
Python 3.7.1 (default, Dec 14 2018, 19:28:38)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/02/15 21:51:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/15 21:51:06 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/02/15 21:51:12 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
,屏幕冻结(没有其他消息)。 我不知道如何解决这个问题。
PS:如链接中所述,我首先部署了一个3节点的hadoop-yarn集群,然后在主节点上安装了spark(在启动yarn-start.sh之后。