我有一张表格如下:
create table issue_attributes (
issue_id number,
attr_timestamp timestamp,
attribute_name varchar2(500),
attribute_value varchar2(500),
CONSTRAINT ia-pk PRIMARY KEY (issue_id, attr_timestamp, attribute_name)
)
这里的想法是拥有一系列与问题相关联的属性(状态,所有者等),同时保留保留属性更改历史记录的能力。
由于数据导入错误,我们在表格中重复了数据:
select issue_id, attr_timestamp, attribute_name, attribute_value
from issue_attributes where issue_id = 1 and attribute_name = 'OWNER';
产生以下样本数据:
1, 01-JAN-2011 12:00, 'OWNER', 'john.doe@example.com'
1, 01-FEB-2011 12:00, 'OWNER', 'john.doe@example.com'
1, 01-MAR-2011 12:00, 'OWNER', 'john.doe@example.com'
1, 01-APR-2011 12:00, 'OWNER', 'john.doe@example.com'
我希望能够找到重复属性的所有实例,并保留最新的属性。在这种情况下,样本数据的期望结果集将是:
1, 01-JAN-2011 12:00, 'OWNER', 'john.doe@example.com'
我们也可能有一个样本数据的例子:
2, 01-JAN-2011 12:00, 'OWNER', 'john.doe@example.com'
2, 01-FEB-2011 12:00, 'OWNER', 'jane.deere@example.com'
2, 01-MAR-2011 12:00, 'OWNER', 'john.doe@example.com'
2, 01-APR-2011 12:00, 'OWNER', 'john.doe@example.com'
在这种情况下,我希望得到结果:
2, 01-JAN-2011 12:00, 'OWNER', 'john.doe@example.com'
2, 01-FEB-2011 12:00, 'OWNER', 'jane.deere@example.com'
2, 01-MAR-2011 12:00, 'OWNER', 'john.doe@example.com'
这是在Oracle 11g上,所以我可以使用SQL或PL / SQL来修复数据。我认为一种方法是通过PL / SQL,对于每个issue_id,下行排序属性,如果属性(x)=属性(x-1),则删除属性(x)。这看起来有点像蛮力,我很想知道是否有一种优雅的方法可以通过SQL实现这一点。
答案 0 :(得分:1)
这是一个很好的" Oracle"这样做的方法。
使用您的样本数据:
SQL> desc issue_attributes
Name Null? Type
----------------------------------------------------------------- -------- --------------------------------------------
ISSUE_ID NUMBER
ATTR_TIMESTAMP TIMESTAMP(6)
ATTRIBUTE_NAME VARCHAR2(500)
ATTRIBUTE_VALUE VARCHAR2(500)
SQL> select * from issue_attributes;
ISSUE_ID ATTR_TIMESTAMP ATTRIBUTE_ ATTRIBUTE_VALUE
---------- ----------------------------------- ---------- ------------------------------
1 01-JAN-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-FEB-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-MAR-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-APR-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-JAN-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-JAN-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-FEB-20 11.12.00.000000 AM OWNER jane.deere@example.com
1 01-MAR-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-APR-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-JAN-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-FEB-20 11.12.00.000000 AM OWNER jane.deere@example.com
1 01-MAR-20 11.12.00.000000 AM OWNER john.doe@example.com
12 rows selected.
SQL> delete from issue_attributes
where rowid in(select rid
from (select rowid rid,
row_number() over (partition by ISSUE_ID,
ATTR_TIMESTAMP,
ATTRIBUTE_NAME,
ATTRIBUTE_VALUE
order by rowid) rn
from issue_attributes)
where rn<> 1);
7 rows deleted.
SQL> select * from issue_attributes;
ISSUE_ID ATTR_TIMESTAMP ATTRIBUTE_ ATTRIBUTE_VALUE
---------- ----------------------------------- ---------- ------------------------------
1 01-JAN-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-FEB-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-MAR-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-APR-20 11.12.00.000000 AM OWNER john.doe@example.com
1 01-FEB-20 11.12.00.000000 AM OWNER jane.deere@example.com
5 rows selected.
希望有所帮助。
答案 1 :(得分:1)
我会查看前一行并查看数据是否已更改。这可以通过使用LAG
分析函数来完成。
您可以回顾之前的值,在时间戳上排序。如果数据已更改,那么您希望保留它。第一行始终保留,因为LAG
在没有先前数据时返回NULL
。
with issue_attributes as (
select 1 as issue_id, date '2011-01-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual union all
select 1 as issue_id, date '2011-02-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual union all
select 1 as issue_id, date '2011-03-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual union all
select 1 as issue_id, date '2011-04-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual union all
select 2 as issue_id, date '2011-01-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual union all
select 2 as issue_id, date '2011-02-01' as attr_timestamp,
'OWNER' as attribute_name, 'jane.deere@example.com' as attribute_value from dual union all
select 2 as issue_id, date '2011-03-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual union all
select 2 as issue_id, date '2011-04-01' as attr_timestamp,
'OWNER' as attribute_name, 'john.doe@example.com' as attribute_value from dual
)
select
issue_id,
attr_timestamp,
attribute_name,
attribute_value,
case when lag(attribute_value) over (partition by issue_id, attribute_name order by attr_timestamp) = attribute_value then null else 'Y'end as keep_value
from
issue_attributes
这将添加一个额外的列来说明是否需要保留数据,然后您可以对其进行过滤:
ISSUE_ID ATTR_TIMESTAMP ATTRIBUTE_NAME ATTRIBUTE_VALUE KEEP_VALUE
1 01/01/2011 OWNER john.doe@example.com Y
1 01/02/2011 OWNER john.doe@example.com
1 01/03/2011 OWNER john.doe@example.com
1 01/04/2011 OWNER john.doe@example.com
2 01/01/2011 OWNER john.doe@example.com Y
2 01/02/2011 OWNER jane.deere@example.com Y
2 01/03/2011 OWNER john.doe@example.com Y
2 01/04/2011 OWNER john.doe@example.com
答案 2 :(得分:0)
我特别不了解Oracle,但有点像
SELECT MAX(attr_timestamp), issue_id, attribute_name, attribute_value
FROM issue_attributes
GROUP BY issue_id, attribute_name, attribute_value
会在一些DBMS中生成一个列表,其中显示每个不同的三元组issue_id, attribute_name, attribute_value
以及最近的时间戳。可能值得一试。
答案 3 :(得分:0)
您要检测的是:具有相同{issueid,attributename,attributevalue}的元组,但(在按时间戳排序时) no 使用相同的{issueid,attributename}进行干预的元组但是不同的{}的AttributeValue。
可以用一个EXISTS和一个NOT EXISTS子查询写成一个查询。
更新:
SET search_path='tmp';
-- The rows you want to delete.
SELECT * FROM issue_attributes to_del
WHERE EXISTS (
SELECT * FROM issue_attributes xx
WHERE xx.issue_id = to_del.issue_id
AND xx.attribute_name = to_del.attribute_name
AND xx.attribute_value = to_del.attribute_value
AND xx.attr_timestamp > to_del.attr_timestamp
AND NOT EXISTS ( SELECT * FROM issue_attributes nx
WHERE nx.issue_id = to_del.issue_id
AND nx.attribute_name = to_del.attribute_name
AND nx.attribute_value <> to_del.attribute_value
AND nx.attr_timestamp > to_del.attr_timestamp
AND nx.attr_timestamp < xx.attr_timestamp
)
) ;
-- For completeness: the rows you want to keep.
SELECT * FROM issue_attributes must_stay
WHERE NOT EXISTS (
SELECT * FROM issue_attributes xx
WHERE xx.issue_id = must_stay.issue_id
AND xx.attribute_name = must_stay.attribute_name
AND xx.attribute_value = must_stay.attribute_value
AND xx.attr_timestamp > must_stay.attr_timestamp
AND NOT EXISTS ( SELECT * FROM issue_attributes nx
WHERE nx.issue_id = must_stay.issue_id
AND nx.attribute_name = must_stay.attribute_name
AND nx.attribute_value <> must_stay.attribute_value
AND nx.attr_timestamp > must_stay.attr_timestamp
AND nx.attr_timestamp < xx.attr_timestamp
)
) ;
结果:
issue_id | attr_timestamp | attribute_name | attribute_value
----------+---------------------+----------------+----------------------
1 | 2011-03-01 12:00:00 | OWNER | john.doe@example.com
1 | 2011-01-01 12:00:00 | OWNER | john.doe@example.com
1 | 2011-02-01 12:00:00 | OWNER | john.doe@example.com
2 | 2011-03-01 12:00:00 | OWNER | john.doe@example.com
(4 rows)
issue_id | attr_timestamp | attribute_name | attribute_value
----------+---------------------+----------------+------------------------
1 | 2011-04-01 12:00:00 | OWNER | john.doe@example.com
2 | 2011-02-01 12:00:00 | OWNER | jane.deere@example.com
2 | 2011-04-01 12:00:00 | OWNER | john.doe@example.com
2 | 2011-01-01 12:00:00 | OWNER | john.doe@example.com
(4 rows)