Windows:Apache Spark历史记录服务器配置

时间:2016-07-17 08:53:45

标签: windows git bash apache-spark apache-spark-sql

我想使用Spark的历史记录服务器来利用我的Web UI的日志记录机制,但我发现在我的Windows机器上运行此代码存在一些困难。

我做了以下事情:

设置我的spark-defaults.conf文件以反映

spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs

我的spark-env.sh反映:

SPARK_LOG_DIR    "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS   "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"

我正在使用Git-BASH来运行start-history-server.sh文件,如下所示:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh

而且,我收到了这个错误:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
  Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
  ========================================
full log in C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out

输出的完整日志可以在下面找到:

Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================

我正在运行一个sparkR脚本,我初始化我的spark上下文,然后调用init()。

请在运行spark脚本之前告知我是否应该运行历史记录服务器?

指针&我将非常感谢您提供的提示(关于伐木)。

2 个答案:

答案 0 :(得分:4)

在Windows上,您需要运行Spark .sh .cmd 文件。根据我所看到的,Spark历史服务器没有 .cmd 脚本。所以基本上它需要手动运行。

我已关注历史服务器Linux脚本,为了在Windows上手动运行,您需要执行以下步骤:

  • 所有历史服务器配置都应在 spark-defaults.conf 文件中设置(删除.template后缀),如下所述
  • 您应该转到spark config目录并将spark.history.*配置添加到%SPARK_HOME%/conf/spark-defaults.conf。如下:

    spark.eventLog.enabled true spark.history.fs.logDirectory file:///c:/logs/dir/path

  • 配置完成后,从%SPARK_HOME%

    运行以下命令

    bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer

  • 输出应该是这样的:

    16/07/22 18:51:23 INFO Utils: Successfully started service on port 18080. 16/07/22 18:51:23 INFO HistoryServer: Started HistoryServer at http://10.0.240.108:18080 16/07/22 18:52:09 INFO ShutdownHookManager: Shutdown hook called

希望它有所帮助! : - )

答案 1 :(得分:0)

以防任何人获得浮动异常:

17/05/12 20:27:50 ERROR FsHistoryProvider: Exception encountered when attempting
 to load application log file:/C:/Spark/Logs/spark--org.apache.spark.deploy.hist
ory.HistoryServer-1-Arsalan-PC.out
java.lang.IllegalArgumentException: Codec [out] is not available. Consider setti
ng spark.io.compression.codec=snappy
        at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(Com

转到SparkHome / config / spark-defaults.conf 并设定  spark.eventLog.compress false