如何使用大查询复制相关子查询

时间:2016-01-19 04:29:59

标签: google-bigquery

BigQuery不支持相关的反半连接。我目前正在努力解决一个问题,如果上述陈述是错误的话会更加容易。

我有两张桌子。

  • 一个表包含所有可能标识符(ProfileGroupModel.sources)
  • 的重复整数字段
  • 第二个包含重复的整数字段值,这些字段已从"可能的"帐户列表(PRO.ignoredSourceList)

如果Big Query确实支持这种类型的查询,那么这些是我试图获得的结果:

SELECT
  pro.pid,
  prg.sources,
FROM [datastore.PRO] AS pro 
JOIN EACH FLATTEN([datastore.ProfileGroupModel], sources) AS prg ON prg.gid = pro.gid
WHERE prg.sources NOT IN (
  SELECT ignoredSourceList
  FROM FLATTEN([datastore.PRO], ignoredSourceList) as proInner
  WHERE proInner.pid = pro.pid
)

有没有人有关于如何将其展开到BigQuery领域内的工作解决方案的任何指示?

1 个答案:

答案 0 :(得分:1)

在上面的示例背后的确切逻辑方面,在黑暗中拍摄一点点,但下面的内容应该可以正常工作

SELECT 
  pro.pid AS pid,
  prg.sources AS source 
FROM [datastore.PRO] AS pro 
JOIN EACH FLATTEN([datastore.ProfileGroupModel], sources) AS prg 
  ON prg.gid = pro.gid
LEFT JOIN EACH FLATTEN([datastore.PRO], ignoredSourceList) AS proInner 
  ON proInner.pid = pro.pid AND  prg.sources = proInner.ignoredSourceList
WHERE proInner.ignoredSourceList IS NULL 
  AND proInner.pid IS NULL