间歇性未找到文件系统异常

时间:2019-10-24 13:02:38

标签: apache-flink apache-beam flink-batch

我们正在flink上运行批处理作业,该作业从GCS读取数据并对该数据进行一些汇总。 我们间歇性地遇到问题:No filesystem found for scheme gs

我们正在运行带有FlinkRunner的Beam版本2.15.0,Flink版本:1.6.4

在对任务管理器进行远程调试时,我们发现在一些任务管理器中,GcsFileSystemRegistrar没有添加到文件系统方案列表中。在这些任务管理器中,我们遇到了这个问题。

仅在org.apache.beam.sdk.io.FileSystems类中的setDefaultPipelineOptions函数调用中修改了集合SCHEME_TO_FILESYSTEM,并且未调用此函数,因此未将GcsFileSystemRegistrar添加到SCHEME_TO_FILESYSTEM。

详细的堆栈跟踪:

java.lang.IllegalArgumentException: No filesystem found for scheme gs
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:463)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
    at org.apache.beam.sdk.io.fs.ResourceIdCoder.decode(ResourceIdCoder.java:49)
    at org.apache.beam.sdk.io.fs.MetadataCoder.decodeBuilder(MetadataCoder.java:62)
    at org.apache.beam.sdk.io.fs.MetadataCoder.decode(MetadataCoder.java:58)
    at org.apache.beam.sdk.io.fs.MetadataCoder.decode(MetadataCoder.java:36)
    at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
    at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
    at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
    at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:592)
    at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:583)
    at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:529)
    at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:92)
    at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
    at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
    at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
    at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
    at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
    at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:94)
    at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
    at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
    at java.lang.Thread.run(Thread.java:748)

为了解决此问题,我们尝试在PTransform的expand函数中调用以下代码: FileSystems.setDefaultPipelineOptions(PipelineOptionsFactory.create()); 调用此函数是为了确保将GcsFileSystemRegistrar添加到列表中,但这尚未解决问题。

有人可以帮助检查为什么会发生这种情况以及可以采取什么措施来解决此问题。

0 个答案:

没有答案