我正在寻找一种机制来控制每天在多个BigQuery表上导入的数据的准确性。每个表具有类似的格式,带有DATE和ID列。表格格式如下:
Table_1
| DATE | ID |
| 2018-10-01 | A |
| 2018-10-01 | B |
| 2018-10-02 | A |
| 2018-10-02 | B |
| 2018-10-02 | C |
我要控制的是通过这种输出表来实现ID数量的演变:
CONTROL_TABLE
| DATE | COUNT(Table1.ID) | COUNT(Table2.ID) | COUNT(Table3.ID) |
| 2018-10-01 | 2 | 487654 | 675386 |
| 2018-10-02 | 3 | 488756 | 675447 |
我正在尝试通过1个单个SQL查询来执行此操作,但是DML面临一些限制,例如:
-> One single SELECT with all the tables jointed is out of question for performance purpose (20+ tables with millions lines)
-> I was thinking of going through temporary tables, but it seems I cannot run Multiple DELETE + INSERT functions on several tables with DML
-> I cannot use a wildcard table as the output of the query
有人会知道如何以最佳方式(最好是通过1个单一查询)获得这种结果吗?