最后,在发现模式不正确后,我设法将数据上传到BQ。但是,调试起来非常困难,因为DirectRunner上没有任何日志。当我有例如我如何调试WriteToBigQuery。模式错误?
我的代码:
lines = messages | 'decode' >> beam.Map(lambda x: x.decode('utf-8'))
output = ( lines
| 'process' >> beam.FlatMap(lambda xml: [jsons.dump(model) for model in process_xmls(xml)])
| beam.WindowInto(window.FixedWindows(1, 0)))
output | 'Write to BiqQuery' >> beam.io.WriteToBigQuery(
table='dataflow.test_V1',
schema=fp_schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
答案 0 :(得分:1)
beam.io.WriteToBigQuery
PTransform返回一个字典,该字典的BigQueryWriteFn.FAILED_ROWS
条目包含所有未能写入的行的PCollection。错误本身记录在https://github.com/apache/beam/blob/release-2.13.0/sdks/python/apache_beam/io/gcp/bigquery.py#L861上,因此应显示在工作日志中。