如何通过均匀划分行数来选择“N”个不同的id?

时间:2018-04-03 03:54:41

标签: sql oracle

我想将下面的初始查询的输出分为5行,其中包含MIN和MAX范围的id。每行MIN和MAX范围的行数之和应该“几乎”相等。

初始查询:

select id, count(id) as rcnt from table group by id order by id;

id     rcnt
----  |-----
1111  | 15
2222  | 35  
3333  | 25
5555  | 30
6666  | 20
7777  | 35  
8888  | 50
9999  | 50

总计:260

每行总和:ceil(260/5)= 52

预期产出:

min_id | max_id | sum (optional)
----   |--------|----
1111   | 2222   | 50
3333   | 5555   | 55
6666   | 7777   | 55
8888   | 8888   | 50
9999   | 9999   | 50

我是通过利用初始查询的输出使用Perl制作自己的算法来实现的。 是否可以仅使用一个查询获得相同的预期输出?

如果有人对我为什么这样做感到好奇,那是因为我使用SPOOL将这些ID的范围转储到5个文件中。每个文件将被并行处理。

非常感谢任何优化此过程的建议。

2 个答案:

答案 0 :(得分:1)

一个选项使用ROW_NUMBER。我的方法是将从1开始的行号分配给当前输出的每一行,按id排序。然后,使用公式FLOOR((rn-1)/2)形成组。这个公式将前两行放在一起,然后放在一起,依此类推。

WITH cte AS (
    SELECT id, COUNT(id) as rcnt,
        ROW_NUMBER() OVER (ORDER BY id) rn
    FROM table
    GROUP BY id
)

SELECT
    MIN(id) AS min_id,
    MAX(id) AS max_id,
    SUM(rcnt) AS rcnt
FROM cte
GROUP BY FLOOR((rn-1) / 2)

答案 1 :(得分:1)

按升序N的顺序将每行分配给5个存储桶之一(在本例中为id存储桶)。然后,对于每个id计算出其项目所在的平均桶数,并将所有ID重新分配给该桶(因此每个桶中可能存在不均匀的数字)。然后,您可以找到每个存储桶的最小和最大id

SQL Fiddle

Oracle 11g R2架构设置

CREATE TABLE table_name ( id ) AS
  SELECT 1111 FROM DUAL CONNECT BY LEVEL <= 15 UNION ALL
  SELECT 2222 FROM DUAL CONNECT BY LEVEL <= 35 UNION ALL
  SELECT 3333 FROM DUAL CONNECT BY LEVEL <= 25 UNION ALL
  SELECT 5555 FROM DUAL CONNECT BY LEVEL <= 30 UNION ALL
  SELECT 6666 FROM DUAL CONNECT BY LEVEL <= 20 UNION ALL
  SELECT 7777 FROM DUAL CONNECT BY LEVEL <= 35 UNION ALL
  SELECT 8888 FROM DUAL CONNECT BY LEVEL <= 50 UNION ALL
  SELECT 9999 FROM DUAL CONNECT BY LEVEL <= 50;

查询1

SELECT MIN( id ) AS min_id,
       MAX( id ) AS max_id,
       SUM( cnt ) AS "sum"
FROM   (
  SELECT id,
         COUNT(*) AS cnt,
         ROUND( AVG( grp ) ) AS grp
  FROM   (
    SELECT id,
           CEIL(
             ROW_NUMBER() OVER ( ORDER BY id )
             / ( COUNT(*) OVER () + 1 )
             * 5                                 -- Number of buckets to assign rows to.
           ) AS grp
    FROM   table_name
    ORDER BY id
  )
  GROUP BY id
)
GROUP BY grp

<强> Results

| MIN_ID | MAX_ID | sum |
|--------|--------|-----|
|   1111 |   2222 |  50 |
|   3333 |   5555 |  55 |
|   8888 |   8888 |  50 |
|   9999 |   9999 |  50 |
|   6666 |   7777 |  55 |