如何基于数组条件包含两个表

时间:2020-04-17 03:10:26

标签: google-bigquery

我有2张桌子:

产品表包含productTitle [string]列 Product Table

关键字映射表包含2列,类别[string]和关键字[重复记录-字符串]

注意:关键字是互斥的:regexp_contains(keyword)唯一地将您分配给1个特定的cat_id

enter image description here

我的目标:合并2个表,以便每个ProdTitle都有cat_id

加入逻辑:如果包含关键字(不区分大小写)prodTitle,则将category_id分配给prodTitle

您将如何高效地做到这一点?

1 个答案:

答案 0 :(得分:2)

要对每个产品使用一行,我将其按productId和productTitle分组。 如果您不想使用分组依据,则可以将其删除并将ARRAY_AGG(cat_id)更改为普通cat_id。 因此,我认为您可以使用此。

WITH 
products AS (
  SELECT 1 productId, 'lorem ipsum cat1 lorem ipsum' as productTitle union all
  SELECT 2 productId, 'lorem ipsum cat2 lorem ipsum' as productTitle union all
  SELECT 3 productId, 'lorem ipsum cat3 lorem ipsum' as productTitle union all
  SELECT 4 productId, 'lorem ipsum cat4 lorem ipsum' as productTitle
),
categories AS (
  SELECT 1 as cat_id, ['cat1', 'something', 'else'] as keywords union all
  SELECT 2 as cat_id, ['cat2', 'another', 'keyword'] as keywords
)
select productId, productTitle, ARRAY_AGG(cat_id) categories
from products p
cross join categories c
WHERE EXISTS (SELECT 1 FROM UNNEST(c.keywords) as k WHERE p.productTitle LIKE CONCAT('%', k, '%'))
GROUP BY 1,2