我正在尝试使用CDH 5.3.0来运行Spark的Thrift Server。我正在尝试遵循Spark SQL指令,但我甚至无法获得--help
选项才能成功运行。在下面的输出中,它会因为无法找到HiveServer2类而死亡。
$ /usr/lib/spark/sbin/start-thriftserver.sh --help
Usage./sbin/start-thriftserver [options] [thrift server options]
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor.
--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
--help, -h Show this help message and exit
--verbose, -v Print additional debug output
Spark standalone with cluster deploy mode only:
--driver-cores NUM Cores for driver (Default: 1).
--supervise If given, restarts the driver on failure.
Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors.
YARN-only:
--executor-cores NUM Number of cores per executor (Default: 1).
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: 2).
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.
Thrift server options:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/server/HiveServer2
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.service.server.HiveServer2
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more
答案 0 :(得分:2)
如错误所示,该类不在类路径中。不幸的是,设置CLASSPATH环境变量不会起作用。我能找到的唯一解决方案是编辑/usr/lib/spark/bin/compute-classpath.sh
并添加此行(它可以在任何地方进行,但是从最后添加一行以明确它是一个补充):< / p>
CLASSPATH="$CLASSPATH:/usr/lib/hive/lib/*"
Cloudera的release notes for 5.3.0明确声明&#34; Spark SQL仍然是CDH&#34;中的一个实验性且不受支持的功能,因此可能需要这样的调整并不奇怪。此外,this response对CDH 5.2中的类似问题表明Cloudera故意排除Hive罐子的尺寸原因。
答案 1 :(得分:1)
我遇到了同样的问题,但我以另一种方式解决了这个问题。
cloudera CDH版本不是5.3.0,它是该版本之前的某个版本,因此您会发现路径略有不同。
简单的解决方案是用另一个版本替换cloudera CDH附带的spark-assembly - **。jar文件。
我从官方下载页面下载了spark。我下载的版本是为hadoop 2.4及更高版本构建的。提取下载的文件并查找spark-assembly - **。jar。
在cloudera安装中,我查找了相同的文件,并在路径下找到了它 / usr / lib / spark / libe / spark-assembly - .jar
上一个路径实际上是实际文件的符号链接。我将jar从spark下载到同一路径,并使符号链接指向新jar( ln -f -s target link )。
每件事都适合我。
答案 2 :(得分:1)
/usr/lib/spark/bin/compute-classpath.sh
设置CLASSPATH="$SPARK_CLASSPATH"
。在使用parcel的CDH上,您可以将hive jar添加到SPARK_CLASSPATH
,如下所示:
SPARK_CLASSPATH=$(ls -1 /opt/cloudera/parcels/CDH/lib/hive/lib/*.jar | sed -e :a -e 'N;s/\n/:/;ta') /opt/cloudera/parcels/CDH/lib/spark/sbin/start-thriftserver.sh --help
答案 3 :(得分:0)
Cloudera社区论坛的说明 http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/CDH-5-5-does-not-have-Spark-Thrift-Server/m-p/41849#M1758:
git clone https://github.com/cloudera/spark.git
cd spark
./make-distribution.sh -DskipTests \
-Dhadoop.version=2.6.0-cdh5.7.0 \
-Phadoop-2.6 \
-Pyarn \
-Phive -Phive-thriftserver \
-Pflume-provided \
-Phadoop-provided \
-Phbase-provided \
-Phive-provided \
-Pparquet-provided
-Phive和-Phive-thriftserver是那里的关键部分。
有一个添加Spark Thrift Server的请求 https://issues.cloudera.org/browse/DISTRO-817 如果你想在CDH中看到,请投票。