通过dask-yarn加载YarnCluster会导致Java错误

时间:2019-05-28 16:00:37

标签: python yarn dask

我正在尝试按照https://yarn.dask.org/en/latest/quickstart.html#usage页中的说明设置和运行Dask-Yarn。

我使用condaconda-pack环境打包到文件environment.tar.gz,然后尝试在python中(从同一文件夹)运行以下命令:

python
>>> from dask_yarn import YarnCluster
>>> cluster = YarnCluster(environment='environment.tar.gz')

这导致下面粘贴了一个Java错误。

19/05/28 15:45:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/05/28 15:45:40 ERROR skein.Driver: Error running Driver
java.lang.UnsupportedClassVersionError: com/google/cloud/hadoop/fs/gcs/GoogleHadoopFileSystem : Unsupported major.minor version 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:278)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:363)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
        at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2750)
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2777)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2794)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2812)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
        at com.anaconda.skein.Driver.getFs(Driver.java:304)
        at com.anaconda.skein.Driver.run(Driver.java:279)
        at com.anaconda.skein.Driver.main(Driver.java:174)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/.conda/envs/dask_yarn/lib/python3.7/site-packages/dask_yarn/core.py", line 295, in __init__
    self._start_cluster(spec, skein_client)
  File "/home/.conda/envs/dask_yarn/lib/python3.7/site-packages/dask_yarn/core.py", line 339, in _start_cluster
    skein_client = _get_skein_client(skein_client)
  File "/home/.conda/envs/dask_yarn/lib/python3.7/site-packages/dask_yarn/core.py", line 46, in _get_skein_client
    return skein.Client(security=security)
  File "/home/.conda/envs/dask_yarn/lib/python3.7/site-packages/skein/core.py", line 353, in __init__
    java_options=java_options)
  File "/home/.conda/envs/dask_yarn/lib/python3.7/site-packages/skein/core.py", line 266, in _start_driver
    raise DriverError("Failed to start java process")
skein.exceptions.DriverError: Failed to start java process

某些搜索似乎表明该错误与编译和运行时之间的版本不匹配有关。我尝试设置环境变量,如下所示,但这也不起作用。还有其他解决办法吗?

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

1 个答案:

答案 0 :(得分:0)

解决此问题的方法是使用Java 1.8而不是Java 1.7。例如,请参见this