编辑:在使用Tim Biegeleisen的解决方案后增加了第三项要求
EDIT2:将罗比的DOB修改为在其父母的结婚日期之前
我正在尝试创建一个查看两个表的查询,并根据百分比确定日期的差异。我知道,超级混乱......让我尝试使用下表解释:
2010-01-01
结婚,期待4个孩子(Parent table
)Child table
查看他们4个孩子的DOB,我们知道Frankie是第二个满足50%门槛的孩子,因此我们使用Frankie的DOB并从Frankie的父母的结婚日期减去它并最终结束3年! 希望使用BigQuery标准SQL是可行的。
Parent table
id married_couple married_at expected_kids
--------------------------------------
1 Bob and Mary 2010-01-01 4
2 Mick and Jo 2010-01-01 4
Child table
id child_name parent_id date_of_birth
--------------------------------------
1 Eddie 1 2012-01-01
2 Frankie 1 2013-01-01
3 Robbie 1 2005-01-01
4 Duncan 1 2015-01-01
5 Rick 2 2014-01-01
Expected SQL result
parent_id half_goal_reached(years)
--------------------------------------
1 3
2
答案 0 :(得分:1)
尝试以下查询,其逻辑过于冗长以至于无法解释。我加入了父表和子表,将父ID,结婚后经过的年数,运行的子女数和预期的子女数排成一行。有了这些信息,我们可以很容易地找到其子项的运行数量匹配或超过预期数量的一半的第一行。
SELECT parent_id, num_years AS half_goal_reached
FROM
(
SELECT parent_id, num_years, cnt, expected_kids,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num_years) rn
FROM
(
SELECT
t2.parent_id,
YEAR(t2.date_of_birth) - YEAR(t1.married_at) AS num_years,
(SELECT COUNT(*) FROM child c
WHERE c.parent_id = t2.parent_id AND
c.date_of_birth <= t2.date_of_birth) AS cnt,
t1.expected_kids
FROM parent t1
INNER JOIN child t2
ON t1.id = t2.parent_id
) t
WHERE
cnt >= expected_kids / 2
) t
WHERE t.rn = 1;
请注意,我计算年度差异的方式可能存在问题,或者我如何计算预期子女数量的一半的阈值。此外,如果我们使用最近的企业数据库,我们可以使用分析函数来获取子节点的运行数而不是相关的子查询,但我不确定Big Query是否会支持它,所以我使用了后者。
答案 1 :(得分:1)
下面是BigQuery Standard SQL的两个问题 第一个更经典的sql方式,第二个更多的是BigQuery风格(我认为)
第一个解决方案:具有分析功能
#standardSQL
SELECT
parent_id,
IF(
MAX(pos) = MAX(CAST(expected_kids / 2 AS INT64)),
MAX(DATE_DIFF(date_of_birth, married_at, YEAR)),
NULL
) AS half_goal_reached
FROM (
SELECT c.parent_id, c.date_of_birth, expected_kids, married_at,
ROW_NUMBER() OVER(PARTITION BY c.parent_id ORDER BY c.date_of_birth) AS pos
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
)
WHERE pos <= CAST(expected_kids / 2 AS INT64)
GROUP BY parent_id
第二种解决方案:使用ARRAY
#standardSQL
SELECT
parent_id,
DATE_DIFF(dates[SAFE_ORDINAL(CAST(expected_kids / 2 AS INT64))], married_at, YEAR) AS half_goal_reached
FROM (
SELECT
parent_id,
ARRAY_AGG(date_of_birth ORDER BY date_of_birth) AS dates,
MAX(expected_kids) AS expected_kids,
MAX(married_at) AS married_at
FROM `child` AS c
JOIN `parent` AS p
ON c.parent_id = p.id
GROUP BY parent_id
)
虚拟数据
您可以使用以下虚拟数据
测试/播放这两种解决方案#standardSQL
WITH `parent` AS (
SELECT 1 id, 'Bob and Mary' married_couple, DATE '2010-01-01' married_at, 4 expected_kids UNION ALL
SELECT 2, 'Mick and Jo', DATE '2010-01-01', 4
),
`child` AS (
SELECT 1 id, 'Eddie' child_name, 1 parent_id, DATE '2012-01-01' date_of_birth UNION ALL
SELECT 2, 'Frankie', 1, DATE '2013-01-01' UNION ALL
SELECT 3, 'Robbie', 1, DATE '2014-01-01' UNION ALL
SELECT 4, 'Duncan', 1, DATE '2015-01-01' UNION ALL
SELECT 5, 'Rick', 2, DATE '2014-01-01'
)