有一个繁重的查询(运行需要15分钟),但它返回的结果超出了我的需要。这是一个CONNECT BY查询,我得到的节点是根节点结果中的后代。即:
Ted
Bob
John
Bob
John
John
通常,解决此问题的方法是使用START WITH条件,通常要求节点的父节点为空。但由于查询的性质,我没有需要比较的START WITH值,直到我有完整的结果集。我基本上试图对我的结果进行双重查询,以便说“QUERY STUFF START”和“不是那些东西的记录”。
以下是查询(在Nicholas Krasnov的帮助下构建,在此处:Oracle Self-Join on multiple possible column matches - CONNECT BY?):
select cudroot.root_user, cudroot.node_level, cudroot.user_id, cudroot.new_user_id,
cudbase.* -- Not really, just simplyfing
from css.user_desc cudbase
join (select connect_by_root(user_id) root_user,
user_id user_id,
new_user_id new_user_id,
level node_level
from (select cudordered.user_id,
coalesce(cudordered.new_user_id, cudordered.nextUser) new_user_id
from (select cud.user_id,
cud.new_user_id,
decode(cud.global_hr_id, null, null, lead(cud.user_id ignore nulls) over (partition by cud.global_hr_id order by cud.user_id)) nextUser
from css.user_desc cud
left join gsu.stg_userdata gstgu
on (gstgu.user_id = cud.user_id
or (gstgu.sap_asoc_global_id = cud.global_hr_id))
where upper(cud.user_type_code) in ('EMPLOYEE','CONTRACTOR','DIV_EMPLOYEE','DIV_CONTRACTOR','DIV_MYTEAPPROVED')) cudordered)
connect by nocycle user_id = prior new_user_id) cudroot
on cudbase.user_id = cudroot.user_id
order by
cudroot.root_user, cudroot.node_level, cudroot.user_id;
这给了我关于相关用户的结果(基于user_id重命名或关联的SAP ID),如下所示:
ROOT_ID LEVEL USER_ID NEW_USER_ID
------------------------------------------------
A5093522 1 A5093522 FG096489
A5093522 2 FG096489 A5093665
A5093522 3 A5093665
FG096489 1 FG096489 A5093665
FG096489 2 A5093665
A5093665 1 A5093665
我需要一种过滤第一个join (select connect_by_root(user_id)...
以从根列表中排除FG096489
和A5093665
的方法。
我能想到的最好START WITH
看起来像这样(尚未测试):
start with user_id not in (select new_user_id
from (select coalesce(cudordered.new_user_id, cudordered.nextUser) new_user_id
from (select cud.new_user_id,
decode(cud.global_hr_id, null, null, lead(cud.user_id ignore nulls) over (partition by cud.global_hr_id order by cud.user_id)) nextUser
from css.user_desc cud
where upper(cud.user_type_code) in ('EMPLOYEE','CONTRACTOR','DIV_EMPLOYEE','DIV_CONTRACTOR','DIV_MYTEAPPROVED')) cudordered)
connect by nocycle user_id = prior new_user_id)
...但我有效地执行了两次15分钟的查询。
我看过在查询中使用分区,但实际上并没有分区...我想查看new_user_ids的完整结果集。还探索了rank()这样的分析函数......我的技巧包是空的。
有什么想法吗?
的澄清
我不希望根列表中的额外记录的原因是因为我只想为每个用户提供一组结果。 IE,如果Bob Smith在他的职业生涯中有四个账户(人们来去经常,作为员工和/或承包商),我想使用一组所有属于Bob Smith的账户。
如果Bob作为承包商来到这里,转为员工,离开,作为另一个国家的承包商回来,并离开/返回现在在我们SAP系统中的合法组织,他的帐户重命名/链可能看起来像:
Bob Smith CONTRACTOR ---- US0T0001 -> US001101 (given a new ID as an employee)
Bob Smith EMPLOYEE ---- US001101 -> EB0T0001 (contractor ID for the UK)
Bob Smith CONTRACTOR SAP001 EB0T000T (no rename performed)
Bob Smith EMPLOYEE SAP001 TE110001 (currently-active ID)
在上面的示例中,四个帐户通过重命名用户时设置的new_user_id
字段或具有相同的SAP ID进行链接。
由于HR经常无法遵循业务流程,因此返回的用户最终可能会将这四个ID中的任何一个恢复到他们。我必须分析Bob Smith的所有ID并说“Bob Smith只能恢复TE110001”,并且如果他们尝试恢复其他内容,则会回复错误。我必须为90,000多条记录做到这一点。
第一列“Bob Smith”只是相关帐户组的标识符。在我的原始示例中,我使用root用户ID作为标识符(例如US0T0001)。如果我使用姓/名来识别用户,我最终会发生冲突。
所以鲍勃史密斯看起来像这样:
US0T0001 1 CONTRACTOR ---- US0T0001 -> US001101 (given a new ID as an employee)
US0T0001 2 EMPLOYEE ---- US001101 -> EB0T0001 (contractor ID for the UK)
US0T0001 3 CONTRACTOR SAP001 EB0T0001 (no rename performed)
US0T0001 4 EMPLOYEE SAP001 TE110001 (currently-active ID)
......其中1,2,3,4是层次结构中的等级。
由于US0T0001,US001101,EB0T0001和TE110001都被考虑在内,我不想要另外一组。但是我现在的结果是将这些帐户列在多个组中:
US001101 1 EMPLOYEE ---- US001101 -> EB0T0001 (
US001101 2 CONTRACTOR SAP001 EB0T0001
US001101 3 EMPLOYEE SAP001 TE110001
EB0T0001 1 CONTRACTOR SAP001 EB0T0001
EB0T0001 2 EMPLOYEE SAP001 TE110001
US001101 1 EMPLOYEE SAP001 TE110001
这会导致两个问题:
您要求扩展记录集......以下是一些实际数据:
-- NumRootUsers tells me how many accounts are associated with a user.
-- The new user ID field is explicitly set in the database, but may be null.
-- The calculated new user ID analyzes records to determine what the next related record is
NumRoot New User Calculated
RootUser Users Level UserId ID Field New User ID SapId LastName FirstName
-----------------------------------------------------------------------------------------------
BG100502 3 1 BG100502 BG1T0873 BG1T0873 GRIENS VAN KION
BG100502 3 2 BG1T0873 BG103443 BG103443 GRIENS VAN KION
BG100502 3 3 BG103443 41008318 VAN GRIENS KION
-- This group causes bad matches for Kion van Griens... the IDs are already accounted for,
-- and this group doesn't even grab all of the accounts for Kion. It's also using a new
-- ID to identify the group
BG1T0873 2 1 BG1T0873 BG103443 BG103443 GRIENS VAN KION
BG1T0873 2 2 BG103443 41008318 VAN GRIENS KION
-- Same here...
BG103443 1 1 BG103443 41008318 VAN GRIENS KION
-- Good group of records
BG100506 3 1 BG100506 BG100778 41008640 MALEN VAN LARS
BG100506 3 2 BG100778 BG1T0877 41008640 MALEN VAN LARS
BG100506 3 3 BG1T0877 41008640 VAN MALEN LARS
-- Bad, unwanted group of records
BG100778 2 1 BG100778 BG1T0877 41008640 MALEN VAN LARS
BG100778 2 2 BG1T0877 41008640 VAN MALEN LARS
-- Third group for Lars
BG1T0877 1 1 BG1T0877 41008640 VAN MALEN LARS
-- Jan... fields are set differently than the above examples, but the chain is calculated correctly
BG100525 3 1 BG100525 BG1T0894 41008651 ZANWIJK VAN JAN
BG100525 3 2 BG1T0894 TE035165 TE035165 41008651 VAN ZANWIJK JAN
BG100525 3 3 TE035165 41008651 VAN ZANWIJK JAN
-- Bad
BG1T0894 2 1 BG1T0894 TE035165 TE035165 41008651 VAN ZANWIJK JAN
BG1T0894 2 2 TE035165 41008651 VAN ZANWIJK JAN
-- Bad bad
TE035165 1 1 TE035165 41008651 VAN ZANWIJK JAN
-- Somebody goofed and gave Ziano a second SAP ID... but we still matched correctly
BG100527 3 1 BG100527 BG1T0896 41008652 STEFANI DE ZIANO
BG100527 3 2 BG1T0896 TE033030 TE033030 41008652 STEFANI DE ZIANO
BG100527 3 3 TE033030 42006172 DE STEFANI ZIANO
-- And we still got extra, unwanted groups
BG1T0896 3 2 BG1T0896 TE033030 TE033030 41008652 STEFANI DE ZIANO
BG1T0896 3 3 TE033030 42006172 DE STEFANI ZIANO
TE033030 3 3 TE033030 42006172 DE STEFANI ZIANO
-- Mark's a perfect example of the missing/frustrating data I'm dealing with... but we still matched correctly
BG102188 3 1 BG102188 BG1T0543 41008250 BULINS MARK
BG102188 3 2 BG1T0543 TE908583 41008250 BULINS R.J.M.A.
BG102188 3 3 TE908583 41008250 BULINS RICHARD JOHANNES MARTINUS ALPHISIUS
-- Not wanted
BG1T0543 3 2 BG1T0543 TE908583 41008250 BULINS R.J.M.A.
BG1T0543 3 3 TE908583 41008250 BULINS RICHARD JOHANNES MARTINUS ALPHISIUS
TE908583 3 3 TE908583 41008250 BULINS RICHARD JOHANNES MARTINUS ALPHISIUS
-- One more for good measure
BG1T0146 3 1 BG1T0146 BG105905 BG105905 LUIJENT VALERIE
BG1T0146 3 2 BG105905 TE034165 42006121 LUIJENT VALERIE
BG1T0146 3 3 TE034165 42006121 LUIJENT VALERIE
BG105905 3 2 BG105905 TE034165 42006121 LUIJENT VALERIE
BG105905 3 3 TE034165 42006121 LUIJENT VALERIE
TE034165 3 3 TE034165 42006121 LUIJENT VALERIE
不确定所有这些信息是否会让您更清楚或者会让您的眼睛回到您的脑海中:)
感谢您的关注!
答案 0 :(得分:1)
我想我拥有它。我们已经允许自己注意时间顺序,而事实上它并不重要。您的START WITH子句应为'NEW_USER_ID IS NULL'。
要按时间顺序排列,你可以'订购cudroot.node_level * -1'。
我还建议您查看使用WITH子句来形成基础数据并对其执行层次结构查询。
答案 1 :(得分:1)
这里你需要的可能是多个查询。每个查询都会找到您要查找的记录的子集。希望每个查询比单个,巨大的查询更简单,更快速。类似的东西:
(这些是袖口的例子)
我认为解决这个难题的部分问题是问题空间太大。通过将此问题细分为更小的部分,每个部分都是可行的。