我正在努力处理此查询(虚拟版本,其中有更多字段):
UPDATE
table1 as base
SET
lines =
ARRAY(
SELECT AS STRUCT
b.line_id,
s.purch_id,
ARRAY(
SELECT AS STRUCT
wh.warehouse_id,
s.is_proposed,
FROM table1 as t, UNNEST(lines) as lb, UNNEST(lb.warehouses) as wh
INNER JOIN
(SELECT
l.line_id,
wh.is_proposed
FROM table2, UNNEST(lines) as l, UNNEST(l.warehouses) as wh) as s
ON lb.line_id = s.line_id AND wh.warehouse_id = s.warehouse_id)
FROM table1, UNNEST(lines) as b
INNER JOIN UNNEST(supply.lines) as s
ON b.line_id = s.line_id)
FROM
table2 as supply
WHERE
base.date = supply.date
AND
base.sales_id = supply.sales_id
table1和table2具有相同的嵌套:
lines
:重复记录lines.warehouses
:在行中重复记录(所以{...,生产线[{...仓库[]
plus table1是table2的子集,具有它的字段的子集,table1从开始就具有NULL(由于信息异步,因此在数据可用时我会刷新信息)。
我首先尝试了此步骤(成功):
UPDATE
table1 as base
SET
lines =
ARRAY(
SELECT AS STRUCT
b.line_id,
s.purch_id,
b.warehouses
FROM table1, UNNEST(lines) as b
INNER JOIN UNNEST(supply.lines) as s
ON b.line_id = s.line_id)
FROM
table2 as supply
WHERE
base.date = supply.date
AND
base.sales_id = supply.sales_id
但是事实上我实际上也需要更新lines.warehouses
,所以我很高兴它可以工作,但还不够。
完整查询有效,当我在终端中尝试最后一个ARRAY部分时,查询速度很快且输出没有重复。 完整的UPDATE仍然没有结束(20分钟后,我杀死了它)。
桌子不是那么大,两边都是20k(完全压平了220k)。
那么我做错什么了吗? 有更好的方法吗?
谢谢
答案 0 :(得分:1)
我终于解决了这个问题,它比我想象的要简单得多。 我想我误解了整个查询嵌套的工作原理。
所以我只链接了从匹配的第一行到最后一个数组的所有可用数据,因为顶层的过滤数据会传播到底层。
UPDATE
table1 as base
SET
lines =
ARRAY(
SELECT AS STRUCT
b.line_id,
s.purch_id,
ARRAY(
SELECT AS STRUCT
wh.warehouse_id,
sh.is_proposed,
FROM UNNEST(b.warehouses) as wh -- take only upper level data
INNER JOIN UNNEST(s.warehouses) as sh -- idem
ON wh.warehouse_id = sh.warehouse_id) -- no need to 'redo' the joining on already filtered ones
FROM UNNEST(base.lines) as b
INNER JOIN UNNEST(supply.lines) as s
ON b.line_id = s.line_id)
FROM
table2 as supply
WHERE
base.date = supply.date
AND
base.sales_id = supply.sales_id
查询在不到1分钟的时间内成功