我想将下面的初始查询的输出分为5行,其中包含MIN和MAX范围的id。每行MIN和MAX范围的行数之和应该“几乎”相等。
初始查询:
select id, count(id) as rcnt from table group by id order by id;
id rcnt
---- |-----
1111 | 15
2222 | 35
3333 | 25
5555 | 30
6666 | 20
7777 | 35
8888 | 50
9999 | 50
总计:260
每行总和:ceil(260/5)= 52
预期产出:
min_id | max_id | sum (optional)
---- |--------|----
1111 | 2222 | 50
3333 | 5555 | 55
6666 | 7777 | 55
8888 | 8888 | 50
9999 | 9999 | 50
我是通过利用初始查询的输出使用Perl制作自己的算法来实现的。 是否可以仅使用一个查询获得相同的预期输出?
如果有人对我为什么这样做感到好奇,那是因为我使用SPOOL将这些ID的范围转储到5个文件中。每个文件将被并行处理。
非常感谢任何优化此过程的建议。
答案 0 :(得分:1)
一个选项使用ROW_NUMBER
。我的方法是将从1开始的行号分配给当前输出的每一行,按id
排序。然后,使用公式FLOOR((rn-1)/2)
形成组。这个公式将前两行放在一起,然后放在一起,依此类推。
WITH cte AS (
SELECT id, COUNT(id) as rcnt,
ROW_NUMBER() OVER (ORDER BY id) rn
FROM table
GROUP BY id
)
SELECT
MIN(id) AS min_id,
MAX(id) AS max_id,
SUM(rcnt) AS rcnt
FROM cte
GROUP BY FLOOR((rn-1) / 2)
答案 1 :(得分:1)
按升序N
的顺序将每行分配给5
个存储桶之一(在本例中为id
存储桶)。然后,对于每个id
计算出其项目所在的平均桶数,并将所有ID重新分配给该桶(因此每个桶中可能存在不均匀的数字)。然后,您可以找到每个存储桶的最小和最大id
:
Oracle 11g R2架构设置:
CREATE TABLE table_name ( id ) AS
SELECT 1111 FROM DUAL CONNECT BY LEVEL <= 15 UNION ALL
SELECT 2222 FROM DUAL CONNECT BY LEVEL <= 35 UNION ALL
SELECT 3333 FROM DUAL CONNECT BY LEVEL <= 25 UNION ALL
SELECT 5555 FROM DUAL CONNECT BY LEVEL <= 30 UNION ALL
SELECT 6666 FROM DUAL CONNECT BY LEVEL <= 20 UNION ALL
SELECT 7777 FROM DUAL CONNECT BY LEVEL <= 35 UNION ALL
SELECT 8888 FROM DUAL CONNECT BY LEVEL <= 50 UNION ALL
SELECT 9999 FROM DUAL CONNECT BY LEVEL <= 50;
查询1 :
SELECT MIN( id ) AS min_id,
MAX( id ) AS max_id,
SUM( cnt ) AS "sum"
FROM (
SELECT id,
COUNT(*) AS cnt,
ROUND( AVG( grp ) ) AS grp
FROM (
SELECT id,
CEIL(
ROW_NUMBER() OVER ( ORDER BY id )
/ ( COUNT(*) OVER () + 1 )
* 5 -- Number of buckets to assign rows to.
) AS grp
FROM table_name
ORDER BY id
)
GROUP BY id
)
GROUP BY grp
<强> Results 强>:
| MIN_ID | MAX_ID | sum |
|--------|--------|-----|
| 1111 | 2222 | 50 |
| 3333 | 5555 | 55 |
| 8888 | 8888 | 50 |
| 9999 | 9999 | 50 |
| 6666 | 7777 | 55 |