ValueProvider问题

时间:2017-05-16 03:42:55

标签: google-cloud-dataflow google-cloud-functions google-api-nodejs-client

我试图获取从云函数传递到数据流模板的属性的值。我收到错误,因为传递的值是一个包装器,并且在编译期间使用.get()方法失败。有这个错误 An exception occurred while executing the Java class. null: InvocationTargetException: Not called from a runtime context.

public interface MyOptions extends DataflowPipelineOptions {
...
@Description("schema of csv file")
ValueProvider<String> getHeader();
void setHeader(ValueProvider<String> header);
...
}

public static void main(String[] args) throws IOException {
...
    List<String> sideInputColumns = Arrays.asList(options.getHeader().get().split(","));
...
    //ultimately use the getHeaders as side inputs
    PCollection<String> input = p.apply(Create.of(sideInputColumns));
    final PCollectionView<List<String>> finalColumnView = input.apply(View.asList());
}

如何从ValueProvider类型中提取值?

1 个答案:

答案 0 :(得分:2)

管道构建期间ValueProvider的值不可用。因此,您需要组织管道,使其始终具有相同的结构,并序列化ValueProvider。在运行时,管道中的各个变换可以检查该值以确定如何操作。

根据您的示例,您可能需要执行以下操作。它创建一个单独的元素,然后使用在运行时评估的DoFn来扩展标题:

public static class HeaderDoFn extends DoFn<String, String> {
  private final ValueProvider<String> header;
  public HeaderDoFn(ValueProvider<String> header) {
    this.header = header;
  }

  @ProcessElement
  public void processElement(ProcessContext c) {
    // Ignore input element -- there should be exactly one
    for (String column : this.header().get().split(",")) {
      c.output(column);
    }
  }
}

public static void main(String[] args) throws IOException {
  PCollection<String> input = p
    .apply(Create.of("one")) // create a single element
    .apply(ParDo.of(new DoFn<String, String>() {
      @ProcessElement
      public void processElement(ProcessContext c) {
      }
    });

  // Note that the order of this list is not guaranteed. 
  final PCollectionView<List<String>> finalColumnView = 
    input.apply(View.asList());        
}

另一种选择是使用NestedValueProvider从选项中创建ValueProvider<List<String>>,并将ValueProvider<List<String>>传递给必要的DoFn,而不是使用侧输入。