使用非唯一连接条件关注查询大小

时间:2016-05-23 23:16:44

标签: sql oracle

我有工作的情况。我在住房工作。我们向房屋提出订单(因此我们的承包商可以出去修理房屋)。

订单包含一个或多个作业住宅对其提出零,一个或多个订单。

这是一个简短的数据定义。我简化了表格 - 但希望你能得到这个想法。订单可以包含许多作业,而属性可以包含许多订单。

CREATE TABLE dwellings (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
address VARCHAR2(100) NOT NULL
);

CREATE TABLE orders (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
created_by VARCHAR2(10) NOT NULL,
created_on DATE NOT NULL,
dwelling_id VARCHAR2(10) NOT NULL REFERENCES dwellings(id)
);

CREATE TABLE jobs (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
sor_id VARCHAR2(10) NOT NULL,
order_id VARCHAR2(10) NOT NULL REFERENCES orders(id)
);

并填充:

INSERT INTO dwellings VALUES ('00ABC', '2 The Mews House Little Boston London E1 1EE');
INSERT INTO dwellings VALUES ('5H88H', '3 Electric House Snodsbury S1 1IT');

INSERT INTO orders VALUES ('000001-A', 'CSMITH', DATE '2016-03-10', '00ABC');
INSERT INTO orders VALUES ('000002-A', 'CSMITH', DATE '2016-03-11', '00ABC');
INSERT INTO orders VALUES ('000003-A', 'AJONES', DATE '2016-03-16', '00ABC');
INSERT INTO orders VALUES ('000004-A', 'CSMITH', DATE '2016-03-16', '5H88H');

INSERT INTO jobs VALUES ('001', '000AA0', '000001-A');
INSERT INTO jobs VALUES ('002', '123BB0', '000001-A');
INSERT INTO jobs VALUES ('003', '000AA0', '000002-A');
INSERT INTO jobs VALUES ('004', '787XD7', '000003-A');
INSERT INTO jobs VALUES ('005', '000AA0', '000003-A');
INSERT INTO jobs VALUES ('006', '787XD7', '000004-A');

分析师想要了解提交与之前订单类似的订单的代理商。经过仔细审查的是SOR_ID,它表示工作的类型。请记住,每个订单都有一个或多个作业。因此,任务是:生成一份报告,显示包含一个或多个重复作业类型的订单到该属性的先前订单

我正在制作的报告将包含这些列标题。

  • 代理商名称
  • 订单ID
  • 地址
  • 以前的订单ID
  • 重复的作业类型

这是查询的开始。我没有对数据库执行它,因为有50,000个属性和100,000个订单以及200,000个作业。我担心表格的大小,因为我加入了不唯一的列。

select * from orders ord 
join orders ord2 on ord.dwelling_id = ord2.dwelling_id --shaky
    and ord.id <> ord2.id
    and ord.created_on - ord2.created_on between 0 and 90
join jobs job on job.order_id = ord.id
join jobs job2 on job2.order_id = ord2.id
where job.sor_id = job2.sor_id

我正在寻找有关如何将此查询重构为更易于管理的内容的建议(没有PLSQL)。请注意,我没有使用LAG / LEAD,我还没有使用LISTAGG来折叠作业类型代码。那会晚些。我担心目前查询的成本有多高。

2 个答案:

答案 0 :(得分:1)

<强>查询

SELECT o.created_by AS agent_name,
       d.address,
       LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
       j.sor_id AS job_type
FROM   dwellings d
       INNER JOIN orders o
       ON ( o.dwelling_id = d.id )
       INNER JOIN jobs j
       ON ( j.order_id = o.id )
GROUP BY o.created_by, d.address, j.sor_id
HAVING COUNT(1) > 1;

<强>输出

AGENT_NAME ADDRESS                                      ORDER_IDS         JOB_TYPE 
---------- -------------------------------------------- ----------------- ----------
CSMITH     2 The Mews House Little Boston London E1 1EE 000001-A,000002-A 000AA0     

列出具有相同类型的不同订单ID并由同一代理在同一地址放置的作业。订单按逗号分隔列表中的时间顺序列出。

但是,如果您想要标题,那么您可以这样做:

SELECT *
FROM   (
  SELECT o.created_by AS agent_name,
         o.id,
         d.address,
         LAG( o.id ) OVER ( PARTITION BY o.created_by, d.address, j.sor_id
                            ORDER BY o.created_on
                          ) AS previous_order_id,
         j.sor_id AS job_type
  FROM   dwellings d
         INNER JOIN orders o
         ON ( o.dwelling_id = d.id )
         INNER JOIN jobs j
         ON ( j.order_id = o.id )
)
WHERE  previous_order_id IS NOT NULL;

哪个会输出:

AGENT_NAME ID         ADDRESS                                      PREVIOUS_ORDER_ID JOB_TYPE 
---------- ---------- -------------------------------------------- ----------------- ----------
CSMITH     000002-A   2 The Mews House Little Boston London E1 1EE 000001-A          000AA0   

如果您想考虑多个代理,则可以从o.created_byGROUP BY条款中删除PARTITION BY。对于热门查询,您需要使用LISTAGG来获取所有代理。像这样:

SELECT LISTAGG( o.created_by, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS agent_name,
       d.address,
       LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
       j.sor_id AS job_type
FROM   dwellings d
       INNER JOIN orders o
       ON ( o.dwelling_id = d.id )
       INNER JOIN jobs j
       ON ( j.order_id = o.id )
GROUP BY d.address, j.sor_id
HAVING COUNT(1) > 1;

或者,对于第二个查询,如下所示:

SELECT *
FROM   (
  SELECT o.created_by AS agent_name,
         o.id,
         d.address,
         LAG( o.id ) OVER ( PARTITION BY d.address, j.sor_id
                            ORDER BY o.created_on
                          ) AS previous_order_id,
         j.sor_id AS job_type
  FROM   dwellings d
         INNER JOIN orders o
         ON ( o.dwelling_id = d.id )
         INNER JOIN jobs j
         ON ( j.order_id = o.id )
)
WHERE  previous_order_id IS NOT NULL;

然后,这两个查询都会输出000003-A放置了ID为AJONES的订单。

答案 1 :(得分:1)

我会尝试改变:

  • ord.id <> ord2.idord2.id < ord.id(不确定这是否适合您)

  • ord.created_on - ord2.created_on between 0 and 90ord2.created_on <= ord.created_on and ord2.created_on >= ord.created_on - 90(不确定RDBMS是否可以进行优化)

  • job.sor_id = job2.sor_id移动到ON子句中(但RDBMS可能会为您执行此操作)

select * from orders ord 
join orders ord2 
    on  ord2.dwelling_id = ord.dwelling_id
    and ord2.id < ord.id
    and ord2.created_on <= ord.created_on        
    and ord2.created_on >= ord.created_on - 90
join jobs job on job.order_id = ord.id
join jobs job2 
    on  job2.order_id = ord2.id
    and job2.sor_id   = job.sor_id;

您需要的索引:

  • 订单(dwelling_id,created_on,id)

  • 作业(order_id,sor_id)