汇总为默认值的更好方法

时间:2020-04-15 20:50:29

标签: sql oracle join group-by query-optimization

在此示例中,我有三个表(个人,业务和ind_to_business)。个人有关于人的信息。企业拥有有关企业的信息。并且ind_to_business具有有关将哪些人链接到哪个业务的信息。这是他们的DDL:

CREATE TABLE individual
(
 ID INTEGER PRIMARY KEY,
 NAME VARCHAR2(100) NOT NULL,
 ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
 ID INTEGER PRIMARY KEY,
 NAME VARCHAR2(100) NOT NULL,
 ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
  ID INTEGER PRIMARY KEY,
  IND_ID REFERENCES individual(id),
  BUS_ID REFERENCES business(id),
  START_DT DATE NOT NULL,
  END_DT DATE
);

我正在寻找为每个人显示一行的最佳方法。如果它们链接到一个公司,我想显示该公司的ENTERPRISE_ID。如果它们链接到多个公司,我想显示默认值“多个”。它们将始终与业务相关联,因此没有必要LEFT JOIN。它们也可以不止一次链接到企业(离开并返回)。同一业务的多个记录将被汇总。

对于以下示例数据:

个人:

+----+------------+---------------+
| ID |    NAME    | ENTERPRISE_ID |
+----+------------+---------------+
|  1 | John Smith | 53a23B7       |
|  2 | Jane Doe   | 63f2a35       |
+----+------------+---------------+

业务:

+----+----------+---------------+
| ID |   NAME   | ENTERPRISE_ID |
+----+----------+---------------+
|  3 | ABC Corp | 2a34d9b       |
|  4 | XYZ Inc  | 34bf21e       |
+----+----------+---------------+

ind_to_business

+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID |  START_DT   |   END_DT    |
+----+--------+--------+-------------+-------------+
|  5 |      1 |      3 | 01-JAN-2000 | 31-DEC-2002 |
|  6 |      1 |      3 | 01-JAN-2015 |             |
|  7 |      2 |      3 | 01-JAN-2000 |             |
|  8 |      2 |      4 | 01-MAR-2006 | 05-JUN-2010 |
|  9 |      2 |      4 | 15-DEC-2019 |             |
+----+--------+--------+-------------+-------------+

我期望以下输出:

+---------+------------+------------+
| IND_ID  |    NAME    | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b    |
| 63f2a35 | Jane Doe   | Multiple   |
+---------+------------+------------+

这是我当前的查询:

SELECT DISTINCT
       sub.ind_id,
       sub.name,
       DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID, 
             i.name,
             b.enterprise_id AS BUS_ID,
             COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
      FROM individual i
      INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
      INNER JOIN business b ON i2b.bus_id = b.id) sub;

我的查询有效,但是此查询正在大型数据集上运行,并且需要很长时间才能运行。我想知道是否有人对如何改进此方法有任何想法,这样就不会浪费太多的处理时间(例如,需要对最终结果进行DISTINCT或仅在内联视图中进行COUNT(DISTINCT)以在上面的DECODE中使用该值)。

我也为此问题创建了一个DBFiddle。 (Link

感谢您的任何输入。

4 个答案:

答案 0 :(得分:2)

您可以尝试使用相关的子查询。这样就无需使用外部distinct

SELECT 
    i.enterprise_id ind_id,
    i.name,
    (
        SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
        FROM ind_to_business i2b
        INNER JOIN business b ON i2b.bus_id = b.id
        WHERE i2b.ind_id = i.id
    ) linked_bus
FROM individual i

答案 1 :(得分:1)

您可以按个人加入汇总的ind_to_business。一种方法:

select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
  select
    ind_id,
    case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
  from ind_to_business
  group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;

答案 2 :(得分:0)

无需两次使用DISTINCT。您可以使用subquery factoring并将嵌入式视图放在WITH子句中,并在子查询本身中创建数据集DISTINCT

WITH data AS
(
  SELECT distinct 
       i.enterprise_id AS IND_ID, 
       i.name,
       b.enterprise_id AS BUS_ID
  FROM individual i
  JOIN ind_to_business i2b ON i.id = i2b.ind_id
  JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
       name,
       case 
         when count(*) = 1 then MIN(bus_id)
         else 'Multiple' 
       end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;

IND_ID     NAME       LINKED_BUS               
---------- ---------- -------------------------
53a23B7    John Smith 2a34d9b                  
63f2a35    Jane Doe   Multiple

答案 3 :(得分:0)

首先,您应该子查询以获取所有需要的尺寸,然后使用CASE语句进行所有最终汇总。

select
    ind_id,
    name,
    case
        when count(*) > 1 then 'Multiple'
        else ind_id
    end as linked_bus
from
(
    select 
        distinct i.enterprise_id as ind_id, 
        i.name,
        b.enterprise_id as bus_id
    from individual i

    join ind_to_business i2b 
    on i.id = i2b.ind_id

    join business b 
    on i2b.bus_id = b.id
) vals

group by
    ind_id,
    name
order by
    ind_id