我有工作的情况。我在住房工作。我们向房屋提出订单(因此我们的承包商可以出去修理房屋)。
订单包含一个或多个作业。 住宅对其提出零,一个或多个订单。
这是一个简短的数据定义。我简化了表格 - 但希望你能得到这个想法。订单可以包含许多作业,而属性可以包含许多订单。
CREATE TABLE dwellings (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
address VARCHAR2(100) NOT NULL
);
CREATE TABLE orders (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
created_by VARCHAR2(10) NOT NULL,
created_on DATE NOT NULL,
dwelling_id VARCHAR2(10) NOT NULL REFERENCES dwellings(id)
);
CREATE TABLE jobs (
id VARCHAR2(10) PRIMARY KEY NOT NULL,
sor_id VARCHAR2(10) NOT NULL,
order_id VARCHAR2(10) NOT NULL REFERENCES orders(id)
);
并填充:
INSERT INTO dwellings VALUES ('00ABC', '2 The Mews House Little Boston London E1 1EE');
INSERT INTO dwellings VALUES ('5H88H', '3 Electric House Snodsbury S1 1IT');
INSERT INTO orders VALUES ('000001-A', 'CSMITH', DATE '2016-03-10', '00ABC');
INSERT INTO orders VALUES ('000002-A', 'CSMITH', DATE '2016-03-11', '00ABC');
INSERT INTO orders VALUES ('000003-A', 'AJONES', DATE '2016-03-16', '00ABC');
INSERT INTO orders VALUES ('000004-A', 'CSMITH', DATE '2016-03-16', '5H88H');
INSERT INTO jobs VALUES ('001', '000AA0', '000001-A');
INSERT INTO jobs VALUES ('002', '123BB0', '000001-A');
INSERT INTO jobs VALUES ('003', '000AA0', '000002-A');
INSERT INTO jobs VALUES ('004', '787XD7', '000003-A');
INSERT INTO jobs VALUES ('005', '000AA0', '000003-A');
INSERT INTO jobs VALUES ('006', '787XD7', '000004-A');
分析师想要了解提交与之前订单类似的订单的代理商。经过仔细审查的是SOR_ID,它表示工作的类型。请记住,每个订单都有一个或多个作业。因此,任务是:生成一份报告,显示包含一个或多个重复作业类型的订单到该属性的先前订单。
我正在制作的报告将包含这些列标题。
这是查询的开始。我没有对数据库执行它,因为有50,000个属性和100,000个订单以及200,000个作业。我担心表格的大小,因为我加入了不唯一的列。
select * from orders ord
join orders ord2 on ord.dwelling_id = ord2.dwelling_id --shaky
and ord.id <> ord2.id
and ord.created_on - ord2.created_on between 0 and 90
join jobs job on job.order_id = ord.id
join jobs job2 on job2.order_id = ord2.id
where job.sor_id = job2.sor_id
我正在寻找有关如何将此查询重构为更易于管理的内容的建议(没有PLSQL)。请注意,我没有使用LAG / LEAD,我还没有使用LISTAGG来折叠作业类型代码。那会晚些。我担心目前查询的成本有多高。
答案 0 :(得分:1)
<强>查询强>:
SELECT o.created_by AS agent_name,
d.address,
LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
GROUP BY o.created_by, d.address, j.sor_id
HAVING COUNT(1) > 1;
<强>输出强>:
AGENT_NAME ADDRESS ORDER_IDS JOB_TYPE
---------- -------------------------------------------- ----------------- ----------
CSMITH 2 The Mews House Little Boston London E1 1EE 000001-A,000002-A 000AA0
列出具有相同类型的不同订单ID并由同一代理在同一地址放置的作业。订单按逗号分隔列表中的时间顺序列出。
但是,如果您想要标题,那么您可以这样做:
SELECT *
FROM (
SELECT o.created_by AS agent_name,
o.id,
d.address,
LAG( o.id ) OVER ( PARTITION BY o.created_by, d.address, j.sor_id
ORDER BY o.created_on
) AS previous_order_id,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
)
WHERE previous_order_id IS NOT NULL;
哪个会输出:
AGENT_NAME ID ADDRESS PREVIOUS_ORDER_ID JOB_TYPE
---------- ---------- -------------------------------------------- ----------------- ----------
CSMITH 000002-A 2 The Mews House Little Boston London E1 1EE 000001-A 000AA0
如果您想考虑多个代理,则可以从o.created_by
或GROUP BY
条款中删除PARTITION BY
。对于热门查询,您需要使用LISTAGG
来获取所有代理。像这样:
SELECT LISTAGG( o.created_by, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS agent_name,
d.address,
LISTAGG( o.id, ',' ) WITHIN GROUP ( ORDER BY o.created_on ) AS order_ids,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
GROUP BY d.address, j.sor_id
HAVING COUNT(1) > 1;
或者,对于第二个查询,如下所示:
SELECT *
FROM (
SELECT o.created_by AS agent_name,
o.id,
d.address,
LAG( o.id ) OVER ( PARTITION BY d.address, j.sor_id
ORDER BY o.created_on
) AS previous_order_id,
j.sor_id AS job_type
FROM dwellings d
INNER JOIN orders o
ON ( o.dwelling_id = d.id )
INNER JOIN jobs j
ON ( j.order_id = o.id )
)
WHERE previous_order_id IS NOT NULL;
然后,这两个查询都会输出000003-A
放置了ID为AJONES
的订单。
答案 1 :(得分:1)
我会尝试改变:
ord.id <> ord2.id
:ord2.id < ord.id
(不确定这是否适合您)
ord.created_on - ord2.created_on between 0 and 90
:ord2.created_on <= ord.created_on and ord2.created_on >= ord.created_on - 90
(不确定RDBMS是否可以进行优化)
将job.sor_id = job2.sor_id
移动到ON子句中(但RDBMS可能会为您执行此操作)
select * from orders ord
join orders ord2
on ord2.dwelling_id = ord.dwelling_id
and ord2.id < ord.id
and ord2.created_on <= ord.created_on
and ord2.created_on >= ord.created_on - 90
join jobs job on job.order_id = ord.id
join jobs job2
on job2.order_id = ord2.id
and job2.sor_id = job.sor_id;
您需要的索引:
订单(dwelling_id,created_on,id)
作业(order_id,sor_id)