动态确定Google Dataflow的文件位置

时间:2018-12-15 22:48:56

标签: google-cloud-dataflow apache-beam

我正在使用价值提供者为我的数据流传递日期

    @Description("Current Date in America/Los_Angeles timezone")
    ValueProvider<String> getLocalDate();
    void setLocalDate(ValueProvider<String> date);

我想查询传入日期的存储桶中的所有文件夹并读取其内容。为此,我正在使用自定义valueProvider

CustomValueProvider valueProvider = new CustomValueProvider(options);

    p.apply(TextIO.read().from(customValueProvider.get()))

    Storage storageOptions = StorageOptions.newBuilder().setProjectId(options.getProjectId()).build().getService();
    BlobListOption listOptions = BlobListOption.currentDirectory();
    Page<Blob> bucketItems = storageOptions.list(options.getCloudStorageBucket(), listOptions);
    for (Blob item : bucketItems.iterateAll()) {
      if (item.isDirectory() && item.getName().contains(localDate)) {
        directoryList.add(item.getName());
      }
    }

我正在从列表中返回第一个值。

但是我得到

`java.lang.NullPointerException
    at org.apache.beam.sdk.io.LocalFileSystem.matchOne(LocalFileSystem.java:223)
    at org.apache.beam.sdk.io.LocalFileSystem.match(LocalFileSystem.java:90)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:119)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:140)
    at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:152)
    at org.apache.beam.sdk.io.FileBasedSource.split(FileBasedSource.java:262)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:275)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:197)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:181)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:160)
    at com.google.cloud.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:77)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:393)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:362)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:290)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)`

0 个答案:

没有答案