在将数据插入Oracle表时查找重复的行

时间:2017-07-30 22:22:35

标签: sql oracle

我有一个带有以下列和示例数据的oracle表dm_djr_bulkjob。

+----------+------+-----------+------------------+------+----------------------+
| device_id| cg_id|firmware_id| best_status      |dmc_id|oltp_updated          |
+----------+------+-----------+------------------+------+----------------------+
| 2009160  | 000  |25822      |No Device Response|1736  |27-JUL-17 10:00:00 AM |
| 2009160  | 000  |25822      |401               |1736  |27-JUL-17 14:00:00 PM |
| 2009157  | 000  |25745      |Wifi Deferred     |1736  |27-JUL-17 02:00:00 AM |
| 2009174  | 000  |25861      |Low Memory        |1736  |27-JUL-17 08:00:00 AM |
+----------+------+-----------+------------------+------+----------------------+

我正在运行一个查询,使用以下查询将数据插入临时表:

insert into DM_ETLTEMP_BULK_BSTRESULT_SUMM
(
device_id,
cg_id,
firmware_id,
best_status,
dmc_id
)SELECT device_id, cg_id, firmware_id, best_Status, dmc_id
from dm_djr_bulkjob where oltp_updated between '27-JUL-17' and '28-JUL-17'

此查询将把dm_djr_bulkjob表中的所有记录插入到临时表中。 我想从重复记录中选择一个记录(我们有基于device_id和firmware_id的重复记录)

Duplicate records for device_id= 2009160 and firmware_id = 25822

我想要两个重复值中的一个记录,其中best_status优先级是以下优先级表中的最小值。 例如:上面的查询为device_id = 2009160和firmware_id = 25822返回了两个重复的条目,但两个记录中的best_status ='无设备响应'在下表中具有最低优先级。

因此,插入临时表的最终记录数应如下所示。

+----------+------+-----------+------------------+------+
| device_id| cg_id|firmware_id| best_status      |dmc_id|
+----------+------+-----------+------------------+------+
| 2009160  | 000  |25822      |No Device Response|1736  |
| 2009157  | 000  |25745      |Wifi Deferred     |1736  |
| 2009174  | 000  |25861      |Low Memory        |1736  |
+----------+------+-----------+------------------+------+

优先级表

+-------------------+--------+
| status            |priority|
+-------------------+--------+
|No Device Response |1       |
|401                |2       |
|402                |3       |
|500                |4       |
|Wifi Deferred      |5       |
|Low Memory         |6       |
| No Device Response|7       |
+-------------------+--------+

请建议查询以解决此要求。

提前致谢!

2 个答案:

答案 0 :(得分:1)

我会在优先级表中添加一个连接,并添加一个分析函数,以便在重复的情况下选择首选行。查询将如下所示:

Select device_id, cg_id, firmware_id, best_Status, dmc_id 
  From (Select a.device_id, a.cg_id, a.firmware_id, a.best_Status, a.dmc_id,
               rank() Over (Partition By a.device_id, a.firmware_id 
                                Order By b.priority) As rnk
          From dm_djr_bulkjob a
          Join priority_table b on b.best_status = a.best_status
         Where a.oltp_updated Between '27-JUL-17' And '28-JUL-17')
 Where rnk = 1;

rank()函数为每一行分配一个等级编号,使得每个唯一(device_id,firmware_id)组合的一行具有rnk = 1:具有最低优先级的行。

答案 1 :(得分:0)

        INSERT
        INTO DM_ETLTEMP_BULK_BSTRESULT_SUMM
            (
                device_id,
                cg_id,
                firmware_id,
                best_status,
                dmc_id
            )
       SELECT  device_id,
            cg_id,
            firmware_id,
            best_Status,
            dmc_id
        FROM
            (SELECT  device_id,
                    cg_id,
                    firmware_id,
                    best_Status,
                    dmc_id,
                    COUNT(*)
                FROM dm_djr_bulkjob
                WHERE oltp_updated BETWEEN '27-JUL-17' AND '28-JUL-17'
                GROUP BY device_id,
                    cg_id,
                    firmware_id,
                    best_Status,
                    dmc_id
                HAVING COUNT(*)<2
            )
        );

试试这个:)