是否可以将Apache Livy配置为与Spark Standalone一起运行?

时间:2016-11-29 22:02:01

标签: hadoop apache-spark

在我安装Apache Livy的机器上(在Ubuntu 16.04上):

(a)是否可以在Spark Standalone模式下运行它?

我正在考虑使用预装为Hadoop 2.6的Spark 1.6.3,可从中下载 https://spark.apache.org/downloads.html

(b)如果是,我该如何配置?

(c)HADOOP_CONF_DIR对于Spark Standalone应该是什么?链接https://github.com/cloudera/livy提到了以下环境变量:

export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf

我已经成功构建了Livy,除了上一个任务,它在Spark安装上待决:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] livy-api ........................................... SUCCESS [  9.984 s]
[INFO] livy-client-common ................................. SUCCESS [  6.681 s]
[INFO] livy-test-lib ...................................... SUCCESS [  0.647 s]
[INFO] livy-rsc ........................................... SUCCESS [01:08 min]
[INFO] livy-core_2.10 ..................................... SUCCESS [  7.225 s]
[INFO] livy-repl_2.10 ..................................... SUCCESS [02:42 min]
[INFO] livy-core_2.11 ..................................... SUCCESS [ 56.400 s]
[INFO] livy-repl_2.11 ..................................... SUCCESS [03:06 min]
[INFO] livy-server ........................................ SUCCESS [02:12 min]
[INFO] livy-assembly ...................................... SUCCESS [ 15.959 s]
[INFO] livy-client-http ................................... SUCCESS [ 25.377 s]
[INFO] livy-scala-api_2.10 ................................ SUCCESS [ 40.336 s]
[INFO] livy-scala-api_2.11 ................................ SUCCESS [ 40.991 s]
[INFO] minicluster-dependencies_2.10 ...................... SUCCESS [ 24.400 s]
[INFO] minicluster-dependencies_2.11 ...................... SUCCESS [  5.489 s]
[INFO] livy-integration-test .............................. SUCCESS [ 37.473 s]
[INFO] livy-coverage-report ............................... SUCCESS [  3.062 s]
[INFO] livy-examples ...................................... SUCCESS [  6.841 s]
[INFO] livy-python-api .................................... FAILURE [  8.053 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:59 min
[INFO] Finished at: 2016-11-29T13:14:10-08:00
[INFO] Final Memory: 76M/2758M
[INFO] ------------------------------------------------------------------------

谢谢。

3 个答案:

答案 0 :(得分:3)

为将来参考,请执行以下步骤: (Ubuntu)

这是您需要遵循的详细步骤:

  1. 安装JDK 8
  2. 安装Spark(spark-2.4.5-bin-hadoop2.7.tgz)或(spark-2.4.5-bin-without-hadoop-scala-2.12.tgz)
  3. 安装Livy(apache-livy-0.7.0-incubating-bin.zip)
  4. 添加变量.bashrc:

    export JAVA_HOME =“ / lib / jvm / jdk1.8.0_251” 导出PATH = $ PATH:$ JAVA_HOME / bin

    export SPARK_HOME = / opt / hadoop / spark-2.4.5-bin-hadoop2.7 导出PATH = $ PATH:$ SPARK_HOME / bin:$ SPARK_HOME / sbin

    导出LIVY_HOME = / opt / hadoop / apache-livy-0.7.0-incubating-bin 导出PATH = $ PATH:$ LIVY_HOME / bin

    导出HADOOP_CONF_DIR = / etc / hadoop / conf <---(可选)

  5. 在$ LIVY_HOME,我们需要创建名为“ logs”的文件夹并为其授予权限,否则当我们启动“ livy-server”时将显示错误。

  6. 现在启动start-master.sh(存在于Spark的sbin文件夹中)

  7. 现在启动start-slave.sh(可以在执行第6步并转到localhost:8080后获得master-url)
  8. 现在Livy的bin文件夹中有“ livy-server”,只需启动它即可。
  9. 现在livy.s UI可以从本地主机访问:8998

10。那里有许多休息端点:https://livy.incubator.apache.org/docs/latest/rest-api.html

11。如果您对运行JAR感兴趣,请使用会话的Batch instad。

12。为spark创建一个简单的应用程序,其中将conf的master作为动态参数传递给它(以便您可以传递master url)

  1. 尝试将这些版本与安装的spark版本一起使用(如果已安装spark-2.4.5-bin-hadoop2.7.tgz):

    scalaVersion:=“ 2.11.12”

    libraryDependencies + =“ org.apache.spark” %%“ spark-core”%“ 2.4.5” libraryDependencies + =“ org.apache.spark” %%“ spark-sql”%“ 2.4.5”

  2. JDK 8是必须的。 (JDK 11导致scala 2.11.12和spark 2.4.5出现问题)

  3. 如果我将Jar文件保存在桌面上,现在正常的火花提交代码为:

spark-submit-class com.company.Main file:///home/user_name/Desktop/scala_demo.jar spark:// abhishek-desktop:7077

  1. 对于Livy来说:

POST本地主机:8998 /批次

{
   "className": "com.company.Main",
   "executorMemory": "20g",
   "args": [
       "spark://abhishek-desktop:7077"
   ],
   "file": "local:/home/user_name/Desktop/scala_demo.jar"
}
  1. 执行上述操作将返回运行状态,我们只需要进入localhost:8998并检查日志以了解结果。

答案 1 :(得分:1)

确保已设置

export SPARK_HOME=/path/to/spark/home

然后运行mvn -DskipTests package

即使没有HADOOP_CONF_DIR

,它也应该有效

答案 2 :(得分:1)

可能是一个缺少的python模块。看看失败的日志。

Traceback (most recent call last):
  File "setup.py", line 18, in <module>
    from setuptools import setup
ImportError: No module named setuptools

在这种情况下,您需要安装setuptools模块。

pip install setuptools
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/20/d7/04a0b689d3035143e2ff288f4b9ee4bf6ed80585cc121c90bfd85a1a8c2e/setuptools-39.0.1-py2.py3-none-any.whl (569kB)
    100% |████████████████████████████████| 573kB 912kB/s 
Installing collected packages: setuptools
Successfully installed setuptools-20.7.0