我有一张表EMPLOYEE,如下:
Enroll Date STS EMP_ID EMP_Name DEPT Rank OST BLOCK
12-Jan-17 Q 123 ABC ABC123 12 Y 1000
14-Jan-17 Q 123 ABC DEF123 12 Y 1000
15-Jan-17 R 123 ABC DEF123 12 Y 100
15-Jan-17 R 123 ABC DEF123 12 Y 200
15-Jan-17 R 123 ABC DEF123 12 Y 300
20-Jan-17 R 123 ABC DEF123 10 Y 300
26-Jan-17 R 456 RST DEF456 8 N 200
26-Jan-17 R 456 RST DEF456 8 N 300
2-Feb-17 Q 123 ABC ABC123 12 Y 300
现在我需要删除每个emp_id的重复行(如果EMP_Name,DEPT,OST和rank相同,则重复)。如果2行的这4个值相同且enroll_date不同,那么我不需要删除该行。如果2行具有相同的注册日期且4个字段(OST,EMP_Name,DEPT和rank)相同,那么我需要保持具有最高块的行(1000后跟300后跟200,依此类推) 所以删除这些数据后,我的表应该有这些行:
Enroll Date STS EMP_ID EMP_Name DEPT Rank OST BLOCK
12-Jan-17 Q 123 ABC ABC123 12 Y 1000
14-Jan-17 Q 123 ABC DEF123 12 Y 1000
15-Jan-17 R 123 ABC DEF123 12 Y 100
2-Feb-17 Q 123 ABC ABC123 12 Y 300
20-Jan-17 R 123 ABC DEF123 10 Y 300
26-Jan-17 R 456 RST DEF456 8 N 200
26-Jan-17 R 456 RST DEF456 8 N 300
我尝试使用以下查询,并将删除包含rn> 1
的行SELECT enroll_date,STS,BLOCK,EMP_ID,EMP_NAME,DEPT,RANK,OST,row_number()over(partition BY emp_id,enroll_date,emp_name,dept,ost,rank ORDER BY enroll_date ASC,block DESC)rn 来自员工
但我每次都只能获得1分。
有人可以在这里查看问题或建议其他方法吗?
答案 0 :(得分:0)
看起来您的enroll_date
值非午夜时间,因此按这些值进行分区也会使这些组合具有唯一性(即使它们只在您显示日期部分时也看不到它)。
我最初的想法是,您的分析row_number()
被太多列分配,并且您不应该包含您想要订购的日期值 - 分区依据并不是真的有意义由同样的东西订购,因为它将是独一无二的。减少您实际想要检查的列,可能是:
row_number() over (partition BY emp_id, emp_name, dept, ost, rank
ORDER BY enroll_date ASC, block DESC)
会产生不同的等级,而不是所有的等级。但我不认为这是正确的;这可能会使您的辅助块排序有些多余,因为您可能不太可能有两行具有完全相同的时间用于一个ID。也许不太可能但不是不可能。
再次重新阅读您的措辞我认为您根本不想按enroll_date
订购,而 希望按日期进行分区;但是,鉴于它包含您在本练习中显然要忽略的非午夜时间,分区必须位于截断日期(默认情况下将时间拖回午夜:
row_number() over (partition BY trunc(enroll_date), emp_id, emp_name, dept, ost, rank
ORDER BY block DESC)
将您的样本数据作为CTE,包括每天稍微不同的时间,以及一个额外的行,以使所有内容与日期相同,这将显示您的原始rn
和我的两个计算值:
with employee (enroll_date, sts, emp_id, emp_name, dept, rank, ost, block) as (
select to_date('12-Jan-17 00:00:00', 'DD-Mon-RR HH24:MI:SS'), 'Q', 123, 'ABC', 'ABC123', 12, 'Y', 1000 from dual
union all select to_date('14-Jan-17 00:00:00', 'DD-Mon-RR HH24:MI:SS'), 'Q', 123, 'ABC', 'DEF123', 12, 'Y', 1000 from dual
union all select to_date('15-Jan-17 00:00:01', 'DD-Mon-RR HH24:MI:SS'), 'R', 123, 'ABC', 'DEF123', 12, 'Y', 100 from dual
union all select to_date('15-Jan-17 00:00:02', 'DD-Mon-RR HH24:MI:SS'), 'R', 123, 'ABC', 'DEF123', 12, 'Y', 200 from dual
union all select to_date('15-Jan-17 00:00:03', 'DD-Mon-RR HH24:MI:SS'), 'R', 123, 'ABC', 'DEF123', 12, 'Y', 300 from dual
union all select to_date('20-Jan-17 00:00:00', 'DD-Mon-RR HH24:MI:SS'), 'R', 123, 'ABC', 'DEF123', 10, 'Y', 300 from dual
union all select to_date('26-Jan-17 00:00:00', 'DD-Mon-RR HH24:MI:SS'), 'R', 456, 'RST', 'DEF456', 8, 'N', 200 from dual
union all select to_date('26-Jan-17 00:00:01', 'DD-Mon-RR HH24:MI:SS'), 'R', 456, 'RST', 'DEF456', 8, 'N', 300 from dual
union all select to_date('2-Feb-17 00:00:00', 'DD-Mon-RR HH24:MI:SS'), 'Q', 123, 'ABC', 'ABC123', 12, 'Y', 300 from dual
union all select to_date('3-Feb-17 00:00:00', 'DD-Mon-RR HH24:MI:SS'), 'Q', 123, 'ABC', 'ABC123', 12, 'Y', 300 from dual
)
SELECT to_char(enroll_date, 'DD-Mon-RR') as date_only,
enroll_date, sts, block, emp_id, emp_name, dept, rank, ost,
row_number() over ( partition BY emp_id, enroll_date, emp_name, dept, ost, rank
ORDER BY enroll_date ASC, block DESC) your_rn,
row_number() over (partition BY emp_id, emp_name, dept, ost, rank
ORDER BY enroll_date ASC, block DESC) my_rn_1,
row_number() over (partition BY trunc(enroll_date), emp_id, emp_name, dept, ost, rank
ORDER BY block DESC) as my_rn_2
FROM employee
ORDER BY enroll_date;
DATE_ONLY ENROLL_DATE S BLOCK EMP_ID EMP DEPT RANK O YOUR_RN MY_RN_1 MY_RN_2
--------- ------------------- - ----- ------ --- ------ ---- - ------- ------- -------
12-Jan-17 2017-01-12 00:00:00 Q 1000 123 ABC ABC123 12 Y 1 1 1
14-Jan-17 2017-01-14 00:00:00 Q 1000 123 ABC DEF123 12 Y 1 1 1
15-Jan-17 2017-01-15 00:00:01 R 100 123 ABC DEF123 12 Y 1 2 3
15-Jan-17 2017-01-15 00:00:02 R 200 123 ABC DEF123 12 Y 1 3 2
15-Jan-17 2017-01-15 00:00:03 R 300 123 ABC DEF123 12 Y 1 4 1
20-Jan-17 2017-01-20 00:00:00 R 300 123 ABC DEF123 10 Y 1 1 1
26-Jan-17 2017-01-26 00:00:00 R 200 456 RST DEF456 8 N 1 1 2
26-Jan-17 2017-01-26 00:00:01 R 300 456 RST DEF456 8 N 1 2 1
02-Feb-17 2017-02-02 00:00:00 Q 300 123 ABC ABC123 12 Y 1 2 1
03-Feb-17 2017-02-03 00:00:00 Q 300 123 ABC ABC123 12 Y 1 3 1
要识别要删除的行,可以使用子查询:
SELECT enroll_date, sts, block, emp_id, emp_name, dept, rank, ost
FROM (
SELECT enroll_date, sts, block, emp_id, emp_name, dept, rank, ost,
row_number() over (partition BY trunc(enroll_date), emp_id, emp_name, dept, ost, rank
ORDER BY block DESC) as my_rn_2
FROM employee
)
WHERE my_rn_2 > 1
ORDER BY enroll_date;
ENROLL_DATE S BLOCK EMP_ID EMP DEPT RANK O
------------------- - ----- ------ --- ------ ---- -
2017-01-15 00:00:01 R 100 123 ABC DEF123 12 Y
2017-01-15 00:00:02 R 200 123 ABC DEF123 12 Y
2017-01-26 00:00:00 R 200 456 RST DEF456 8 N
您需要确定对您的数据和要求有何意义。
答案 1 :(得分:0)
我正在创建一个临时表,它将包含所有非重复值:
create table employee_temp as
with duplicates as (
SELECT enroll_date, STS, BLOCK, EMP_ID, EMP_NAME, DEPT,RANK, OST, row_number() over ( partition BY emp_id, trunc(enroll_date),emp_name, dept, ost, rank ORDER BY enroll_date ASC, block DESC)rn FROM employee )
SELECT enroll_date, STS, BLOCK, EMP_ID, EMP_NAME, DEPT,RANK, OST from duplicates where rn =1;