Python Apache Beam数据管道分组

时间:2019-03-17 11:06:01

标签: python apache-beam

我对这里的转换有些困惑。我正在尝试按ID(键列)分组并按DATE desc排序并获取最新记录。与DATE desc按ID顺序对row_number()进行分区相似。

    Not sure how to get started on the next step. any help is really appreciated.

(p 
            | 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(
query = """select COL1, COL2, ID, DATE from
                                FROM `test.rex.t1`
                                 LIMIT 1000""", use_standard_sql = True))
     | 'Write to BigQuery' >> beam.io.Write(
                                 beam.io.BigQuerySink('test:res.t1_test',
                                 schema=get_schema('t1'),
                                 write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
                                 create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED))
                             )

0 个答案:

没有答案