我对这里的转换有些困惑。我正在尝试按ID(键列)分组并按DATE desc排序并获取最新记录。与DATE desc按ID顺序对row_number()进行分区相似。
Not sure how to get started on the next step. any help is really appreciated.
(p
| 'ReadTable' >> beam.io.Read(beam.io.BigQuerySource(
query = """select COL1, COL2, ID, DATE from
FROM `test.rex.t1`
LIMIT 1000""", use_standard_sql = True))
| 'Write to BigQuery' >> beam.io.Write(
beam.io.BigQuerySink('test:res.t1_test',
schema=get_schema('t1'),
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED))
)