我需要按顺序执行以下操作: -
PCollection<String> read = p.apply("Read Lines",TextIO.read().from(options.getInputFile()))
.apply("Get fileName",ParDo.of(new DoFn<String,String>(){
ValueProvider<String> fileReceived = options.getfilename();
@ProcessElement
public void procesElement(ProcessContext c)
{
fileName = fileReceived.get().toString();
LOG.info("File: "+fileName);
}
}));
PCollection<TableRow> rows = p.apply("Read from BigQuery",
BigQueryIO.read()
.fromQuery("SELECT table,schema FROM `DatasetID.TableID` WHERE file='" + fileName +"'")
.usingStandardSql());
如何在Apache Beam / Dataflow中实现这一目标?
答案 0 :(得分:1)
您似乎希望将BigQueryIO.read().fromQuery()
应用于依赖于ValueProvider<String>
中PipelineOptions
类型属性可用值的查询,并且管道中无法访问提供程序施工时间 - 即您通过模板调用工作。
在这种情况下,正确的解决方案是使用NestedValueProvider
:
PCollection<TableRow> tableRows = p.apply(BigQueryIO.read().fromQuery(
NestedValueProvider.of(
options.getfilename(),
new SerializableFunction<String, String>() {
@Override
public String apply(String filename) {
return "SELECT table,schema FROM `DatasetID.TableID` WHERE file='" + fileName +"'";
}
})));