输入:
+---------+---------+--------+ | row_min | row_max | tCount | +---------+---------+--------+ | 2 | 4 | 1 | | 7 | 10 | 2 | | 13 | 14 | 3 | +---------+---------+--------+
必需输出:
+-----+--------+ | row | tcount | +-----+--------+ | 2 | 1 | | 3 | 1 | | 4 | 1 | | 7 | 2 | | 8 | 2 | | 9 | 2 | | 10 | 2 | | 13 | 3 | | 14 | 3 | +-----+--------+
row_min和row_max在输出中展开,其范围内对应的tcount 这个步骤是数据转换的一部分,我需要在数据集上使用SQL(驻留在Amazon redshift中的数据)。我坚持这个特殊的步骤。 请提供相同所需的SQL代码,希望仅限于使用连接和分析函数。
答案 0 :(得分:2)
您可以使用足够大的计数表来包含数字,作为表格的MAX(row_max)
高:
WITH Tally AS (
SELECT ROW_NUMBER() OVER() AS n
FROM (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ) x(n)
CROSS JOIN (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ) y(n)
)
SELECT n, tCount
FROM Tally AS t
INNER JOIN mytable AS m ON t.n >= m.row_min AND t.n <= m.row_max
我认为Redshift支持简单的,非递归的CTE,所以上面应该可行。