我们在BQ中充分利用了ARRAY
和STRUCT
,直到由于主题错误而无法更改数组内容的程度。
参见以下使用公共数据的简单示例。假设INNER JOIN
无法使用,因为图像丢失,故意或错误。
现在,我知道通常可以将left join
从数组重定义移至FROM
子句并使用ARRAY_AGG
,但这并不总是可能的。
在我们的例子中,要更新的数组之外的“其他字段”是其他数组或结构-例如github-nested
表。
由于您无法在SELECT DISTINCT
或STRUCT
字段上执行ARRAY
,因此最终需要UNNEST所有内容并使用许多ARRAY_AGG和大量资源从头开始重新创建表消耗和OOM风险。对于具有很多嵌套字段的表,这是不可能的。
SELECT
* EXCEPT(webDetection),
STRUCT(
webDetection.partialMatchingImages,
webDetection.pagesWithMatchingImages,
webDetection.fullMatchingImages,
ARRAY(
SELECT AS STRUCT
fmi.score,
fmi.url,
i.object_id
FROM
data.webDetection.fullMatchingImages fmi
LEFT JOIN
`bigquery-public-data.the_met.images` i
ON
fmi.url = i.original_image_url
) AS fullMatchingImages_from_met,
webDetection.webEntities
) AS webDetection
FROM
`bigquery-public-data.the_met.vision_api_data` data
任何想法如何避免重新汇总?
答案 0 :(得分:2)
假设从概念上讲您的查询正确,并且唯一的问题是错误correlated subqueries that reference other tables are not supported
-尝试替换下面的片段
FROM
data.webDetection.fullMatchingImages fmi
LEFT JOIN
`bigquery-public-data.the_met.images` i
ON
fmi.url = i.original_image_url
与
FROM
data.webDetection.fullMatchingImages fmi
CROSS JOIN
`bigquery-public-data.the_met.images` i
WHERE
fmi.url = i.original_image_url
更新,添加不匹配的网址
SELECT * EXCEPT(webDetection),
STRUCT(
webDetection.partialMatchingImages,
webDetection.pagesWithMatchingImages,
webDetection.fullMatchingImages,
ARRAY(
SELECT AS STRUCT *
FROM t.webDetection.fullMatchingImages_from_met_temp
UNION ALL
SELECT AS STRUCT *, NULL
FROM t.webDetection.fullMatchingImages
WHERE NOT url IN (SELECT url FROM t.webDetection.fullMatchingImages_from_met_temp)
) AS fullMatchingImages_from_met,
webDetection.webEntities
) AS webDetection
FROM (
SELECT * EXCEPT(webDetection),
STRUCT(
webDetection.partialMatchingImages,
webDetection.pagesWithMatchingImages,
webDetection.fullMatchingImages,
ARRAY(
SELECT AS STRUCT
fmi.score,
fmi.url,
i.object_id
FROM data.webDetection.fullMatchingImages fmi
JOIN `bigquery-public-data.the_met.images` i
ON fmi.url = i.original_image_url
) AS fullMatchingImages_from_met_temp,
webDetection.webEntities
) AS webDetection
FROM `bigquery-public-data.the_met.vision_api_data` data
) t
答案 1 :(得分:0)
要扩展上述答案,可能是查询(对我来说是另一个查询)失败了,因为优化器仍然认为子查询太复杂了。
在这种情况下,请尝试避免使用UNION ALL
并使用ARRAY_CONCAT()
:
SELECT * EXCEPT(webDetection),
STRUCT(
webDetection.partialMatchingImages,
webDetection.pagesWithMatchingImages,
webDetection.fullMatchingImages,
ARRAY_CONCAT(
ARRAY(
SELECT AS STRUCT *
FROM t.webDetection.fullMatchingImages_from_met_temp
),
ARRAY(
SELECT AS STRUCT *, NULL
FROM t.webDetection.fullMatchingImages
WHERE NOT url IN (SELECT url FROM t.webDetection.fullMatchingImages_from_met_temp)
)
) AS fullMatchingImages_from_met,
webDetection.webEntities
) AS webDetection
FROM (
SELECT * EXCEPT(webDetection),
STRUCT(
webDetection.partialMatchingImages,
webDetection.pagesWithMatchingImages,
webDetection.fullMatchingImages,
ARRAY(
SELECT AS STRUCT
fmi.score,
fmi.url,
i.object_id
FROM data.webDetection.fullMatchingImages fmi
JOIN `bigquery-public-data.the_met.images` i
ON fmi.url = i.original_image_url
) AS fullMatchingImages_from_met_temp,
webDetection.webEntities
) AS webDetection
FROM `bigquery-public-data.the_met.vision_api_data` data
) t
BQ接受了这一事实,有趣的是,它甚至比UNION ALL
还要快!与ARRAY(... INNER JOIN...)
另一方面,即使使用了一些冗长的变通办法-可能不会持续很久-BigQuery
优化程序也需要进一步调整。参考原始错误消息Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN
,简单的LEFT JOIN
对我来说是非常有效的...
@readers,仅供参考,here已提交错误。确保对其“加注星标”以提高其优先级!