在离线Spark群集中安装graphframe软件包

时间:2018-10-31 12:14:56

标签: apache-spark package graphframes

我有一个脱机pyspark群集(无法访问互联网),需要安装graphframes库。

我已从$ SPARK_HOME / jars /中添加的here手动下载了jar,然后尝试使用它时出现以下错误:

error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term typesafe in package com,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access term scalalogging in value com.typesafe,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.typesafe.
error: missing or invalid dependency detected while loading class file 'Logging.class'.
Could not access type LazyLogging in value com.slf4j,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.slf4j.

哪种方法可以脱机安装所有依赖项?

1 个答案:

答案 0 :(得分:1)

我设法安装了graphframes libarary。首先,我发现了graphframes依赖项,其中:

scala-logging-api_xx-xx.jar
scala-logging-slf4j_xx-xx.jar

其中xx是scala和jar版本的正确版本。然后,我将它们安装在正确的路径中。因为我在Cloudera机器上工作,所以正确的路径是:

  

/ opt / cloudera / parcels / SPARK2 / lib / spark2 / jars /

如果您不能将它们放置在群集的此目录中(因为您没有超级用户权限,并且您的管理员超级懒惰),则只需在spark-submit / spark-shell中添加

spark-submit ..... --driver-class-path /path-for-jar/  \
                   --jars /../graphframes-0.5.0-spark2.1-s_2.11.jar,/../scala-logging-slf4j_2.10-2.1.2.jar,/../scala-logging-api_2.10-2.1.2.jar

这适用于Scala。为了将图框用于python,您需要 下载graphframes jar,然后通过外壳

#Extract JAR content
 jar xf graphframes_graphframes-0.3.0-spark2.0-s_2.11.jar
#Enter the folder
 cd graphframes
#Zip the contents
 zip graphframes.zip -r *

然后将压缩文件添加到spark-env.sh或bash_profile中的python路径中

export PYTHONPATH=$PYTHONPATH:/..proper path/graphframes.zip:.

然后打开外壳程序/提交(再次使用与scala相同的参数)导入图框正常

link对于此解决方案非常有用

相关问题