HDFS检查点目录

时间:2016-12-14 17:46:58

标签: apache-spark persistence

对于已经运行并执行了数十次的spark程序,在以下逻辑上发生了一个有趣的文件系统错误,以设置checkpoint dir

val tempDir = s"alsTest"
sc.setCheckpointDir(tempDir)

这是错误:

org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

这是完整的堆栈跟踪:

Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2400)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2411)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
    at org.apache.spark.SparkContext$$anonfun$setCheckpointDir$2.apply(SparkContext.scala:2076)
    at org.apache.spark.SparkContext$$anonfun$setCheckpointDir$2.apply(SparkContext.scala:2074)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.SparkContext.setCheckpointDir(SparkContext.scala:2074)
    at com.blazedb.spark.ml.AlsTest$.main(AlsTest.scala:331)
    at com.blazedb.spark.ml.AlsTest.main(AlsTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ExceptionInInitializerError
    at tachyon.Constants.<clinit>(Constants.java:328)
    at tachyon.hadoop.AbstractTFS.<clinit>(AbstractTFS.java:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 21 more
Caused by: java.lang.RuntimeException: java.net.ConnectException: Permission denied (connect failed)
    at com.google.common.base.Throwables.propagate(Throwables.java:160)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:398)
    at tachyon.util.network.NetworkAddressUtils.getLocalHostName(NetworkAddressUtils.java:320)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:122)
    at tachyon.conf.TachyonConf.<init>(TachyonConf.java:111)
    at tachyon.Version.<clinit>(Version.java:27)
    ... 29 more
Caused by: java.net.ConnectException: Permission denied (connect failed)
    at java.net.Inet6AddressImpl.isReachable0(Native Method)
    at java.net.Inet6AddressImpl.isReachable(Inet6AddressImpl.java:77)
    at java.net.InetAddress.isReachable(InetAddress.java:502)
    at java.net.InetAddress.isReachable(InetAddress.java:461)
    at tachyon.util.network.NetworkAddressUtils.isValidAddress(NetworkAddressUtils.java:414)
    at tachyon.util.network.NetworkAddressUtils.getLocalIpAddress(NetworkAddressUtils.java:382)
    ... 33 more

请注意,使用alsTest的相对路径之前一直运行正常。我们的RDD存储设置为MEMORY_AND_SER OFF_HEAP)。我们还可以通过查看hdfs

中的内容来验证这一点
$hdfs dfs -lsr
drwxr-xr-x   - boescst supergroup          0 2016-12-13 12:43 alsTest/78081dc9-06f5-43d6-bcfb-1cfea7b4f015
drwxr-xr-x   - boescst supergroup          0 2016-12-13 12:19 alsTest/e2dd272b-19fe-4ee8-87d0-2a9afe141c9e

那么为什么Spark FileSystem类现在会尝试访问OFF_HEAP(tachyon)?

更新这变得越来越有趣:甚至在Tachyon错误中明确指定hdfs URL结果

val tempDir = s"hdfs://$host:8020:alsTest/"
sc.setCheckpointDir(tempDir)

<same error as above>

1 个答案:

答案 0 :(得分:3)

问题在于昨天在我的系统上首次启用的新 VPN 软件当VPN软件被暂停时,HDFS网址再次正确解析