Question

我想知道什么是针对此问题最优化的SQL。我必须在包含无效数据的表上应用SQL查询，以更正所有那些无效数据。该表的结构如下：

TABLE(customer_id, start_date, end_date, type)

当前，该表可以包含给定元组（customer_id，类型）的许多行。我的查询需要将属于一个组的所有行“合并”为一个行，并保留最近的开始日期和最早的结束日期：

Cust1;01/01/2012;01/01/2020;1
Cust1;01/01/2010;01/01/2024;1

应转换为单行

Cust1;01/01/2012;01/01/2024;1

我不仅要校正数据，还需要校正：如果行数超过1，则删除行并从每个行中检索数据！我希望我的解释很清楚！我使用Oracle DBMS

谢谢

Answer 1

使用>3.3函数

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.maven.execution.MavenSession.getRepositorySession()Lorg/sonatype/aether/RepositorySystemSession;
        at org.eclipse.tycho.core.maven.utils.PluginRealmHelper.execute(PluginRealmHelper.java:86)

我认为您想使用这些数据创建另一个表

max()

Answer 2

进行汇总：

select customer_id, max(start_date), max(end_date), type
from table t
group by customer_id, type;

Answer 3

如果重复行的数量很少，则最好使用就地更新/删除的替代方法。

所以首先检查重复的行数

with clean as (
select CUSTOMER_ID, TYPE, max(start_date) start_date_clean, max(end_date)  end_date_clean
from tab group by CUSTOMER_ID, TYPE)
select tab.*, start_date_clean, end_date_clean
from tab join clean on tab.CUSTOMER_ID = clean.CUSTOMER_ID and tab.TYPE = clean.TYPE
where  start_date != start_date_clean or  end_date != end_date_clean
;

此查询将返回将要处理的所有行，即开始或结束日期不正确。

如果该数字很大-按照其他答案的建议进行操作-复制表并将原表替换为副本。

如果**数字很小*，请采用update / delete的方式：

update tab a
set a.START_DATE = (select max(b.START_DATE) from tab b where a.customer_id = b.customer_id and a.type = b.type),
a.END_DATE = (select max(b.END_DATE) from tab b where a.customer_id = b.customer_id and a.type = b.type)
where (a.customer_id, a.type) in 
( 
select tab.CUSTOMER_ID, tab.TYPE 
from tab join 
(select CUSTOMER_ID, TYPE, max(start_date) start_date_clean, max(end_date)  end_date_clean
from tab group by CUSTOMER_ID, TYPE) clean 
on tab.CUSTOMER_ID = clean.CUSTOMER_ID and tab.TYPE = clean.TYPE
where  start_date != start_date_clean or  end_date != end_date_clean);

这会将start和end日期中所有受影响的行更新为正确的价位。

示例

CUSTOMER_ID START_DATE          END_DATE                  TYPE
----------- ------------------- ------------------- ----------
          1 01-01-2013 00:00:00 01-01-2016 00:00:00          1 
          1 01-01-2012 00:00:00 01-01-2018 00:00:00          1 
          1 01-01-2010 00:00:00 01-01-2017 00:00:00          1 
          2 01-01-2010 00:00:00 01-01-2018 00:00:00          1 
          3 01-01-2010 00:00:00 01-01-2018 00:00:00          1

已更新为

CUSTOMER_ID START_DATE          END_DATE                  TYPE
----------- ------------------- ------------------- ----------
          1 01-01-2013 00:00:00 01-01-2018 00:00:00          1 
          1 01-01-2013 00:00:00 01-01-2018 00:00:00          1 
          1 01-01-2013 00:00:00 01-01-2018 00:00:00          1 
          2 01-01-2010 00:00:00 01-01-2018 00:00:00          1 
          3 01-01-2010 00:00:00 01-01-2018 00:00:00          1

下一步，必须删除重复的行。这将使下一个删除ROW_NUMBER的用户来标识重复项：

delete from tab where rowid in 
(select RID from (
  select rowid rid,
  row_number() over (partition by CUSTOMER_ID, TYPE order by null) rn
  from tab) 
where rn > 1)
;

您所看到的-蛮力复制方法在查询中很简单，但是会使表离线一段时间。您需要两倍的空间来执行它，这将需要一些时间。

update 方法更复杂，但是没有维护窗口，并且可以很快完成。

SQL Merge / Group行合并为一个

3 个答案: