我有一个用apache-beam创建的代码,该代码读取一个csv文件并将其插入到BigQuery中。在管道中,我执行三个步骤(应用):1.读取csv,2.将文本转换为TableRow,并3.插入到BigQuery(BigQueryIO.writeTableRow()
)
我创建了模板,但是当我要执行它(使用dataflow-runner)时,它仅执行BigQuery步骤。不要开始读取csv或转换为TableRow的步骤。
会发生什么?
-- This is the failed execution --
我尝试注释BigQuery块(应用),然后在其中执行前面的步骤。我还尝试并行生成管道,并运行它。当我将管道链接到BigQuery的步骤(应用)时,就会出现问题。
public static void main(String[] args) throws Throwable {
String sourceFilePath = "gs://dgomez_test/input.csv";
String tempLocationPath = "gs://dgomez_test/tmp";
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
options.setTempLocation(tempLocationPath);
options.setJobName("csvtobq");
Pipeline p = Pipeline.create(options);
p.apply("read csv", TextIO.read().from(sourceFilePath))
.apply("string to tablerow", ParDo.of(new FormatForBigquery()));
.apply("write to bigquery",
BigQueryIO.writeTableRows().to(TABLE)
.withSchema(FormatForBigquery.getSchema())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));
p.run();
}