我尝试在Windows 10上独立安装hadoop 3.3和spark 3.1。然后在Windows 10的环境变量中键入HADOOP_HOME
和YARN_HOME
。Hadoop 3.3安装成功。 start-dfs.cmd
和start-yarn.cmd
在管理员模式下正常工作。我还将火花安装在同一本地主机上,并在环境变量上正确输入SPARK_HOME
,HADOOP_CONF_DIR
和YARN_CONF_DIR
。然后在spark-defaults.conf
文件夹的%SPARK_HOME%\conf
上设置火花配置,如下所示,
spark.driver.host localhost
spark.yarn.jars file:///C:/spark-3.0.1-bin-hadoop3.2/jars/*.jar
spark.master spark://localhost:7077
spark.eventLog.enabled true
spark.eventLog.dir file:///C:/spark-3.0.1-bin-hadoop3.2/sparkeventlogs
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.yarn.am.memory 1g
spark.executor.instances 1
当我在本地模式下执行pyspark和spark-shell时,这些命令永远不会引发异常,但是带有主选项--master spark://localhost:7077
或--master yarn
的命令无法运行。
> spark-shell --master spark://localhost:7077
client.StandaloneAppClient$ClientEndpoint: Failed to connect to master localhost:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anon$1.run(StandaloneAppClient.scala:106)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:253)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:195)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
对于--master = yarn
> spark-shell --master=yarn
Spark context available as 'sc' (master = yarn, app id = application_1602065677057_0002).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.7)
Type in expressions to have them evaluated.
Type :help for more information.
scala> 2020-10-07 19:55:58,365 WARN server.TransportChannelHandler: Exception in connection from /127.0.0.1:51292
java.io.IOException: An existing connection was forcibly closed by the remote host
at java.base/sun.nio.ch.SocketDispatcher.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:358)
at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
2020-10-07 19:56:00,293 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 1 for reason Container from a bad node: container_1602065677057_0002_01_000002 on host: DESKTOP-6MU8TKF.localdomain. Exit status: 1. Diagnostics: [2020-10-07 19:55:58.400]Exception from container-launch.
Container id: container_1602065677057_0002_01_000002
我认为pyspark和spark-shell只能在本地模式下运行。这些命令根本无法建立任何其他连接。您是否认为Windows 10身份验证问题带来了这些例外?任何答复将是感激的。最好的问候。