Question

我有一个表格class OrderPhoto extends ResourceCollection { /** * Transform the resource collection into an array. * * @param \Illuminate\Http\Request $request * @return array */ public function toArray($request) { return parent::toArray($request); } }，其中包含转录文本的段落以及它们在列中的引用：

transcription

，第二个表text, transcription_id(PK), t_notes, citation是关系表，该关系表将文本中引用的位置（来自另一个表）链接到该转录记录。该表具有以下列：

town_transcription

这些文本中的许多段落都引用了多个城镇，但愚蠢的是，我只是复制了记录并将它们分别链接到每个城镇。我已使用以下SQL查询确定了重复的文本行：

town_id(FK), transcription_id(FK), confidence_interval

我现在有大约2000行（某些文本段落有2到6个重复），我需要从SELECT * FROM transcription aa WHERE (select count(*) from transcription bb WHERE (bb.text = aa.text) AND (bb.citation = aa.citation)) > 1 ORDER BY text ASC;表中删除多余的transcription_id，并从中更改transcription关系表transcription_id指向剩余的，现在唯一的转录记录。通过阅读其他问题，我认为可能需要使用town_transcription和UPDATE FROM，但是我真的不知道如何实现，我只是一个初学者，感谢您的帮助。

Answer 1

使用row_number() over(...)来标识重复信息的行。 over子句中的partition by text, citation将迫使行号系列对于这些值的每个唯一集合重新从1开始：

select
     *
from (
       select
              text, transcription_id, t_notes, citation
            , row_number() over(partition by text, citation 
                                order by transcription_id) as rn
       from transcription 
     ) d
where rn > 1

一旦您将这些行验证为不需要的行，然后对删除语句使用相同的逻辑。

但是，您可能会 t_notes列中的信息丢失-您愿意这样做吗？

Answer 2

这单个命令应该完成所有操作：

WITH blacklist AS (  -- identify duplicate IDs and their master
   SELECT *
   FROM  (
      SELECT transcription_id
           , min(transcription_id) OVER (PARTITION BY text, citation) AS master_id
      FROM   transcription
      ) sub
   WHERE  transcription_id <> master_id
   )
, upd AS (  -- redirect referencing rows
   UPDATE town_transcription tt
   SET    transcription_id = b.master_id
   FROM   blacklist b
   WHERE  b.transcription_id = tt.transcription_id
   )
DELETE FROM transcription t  -- kill dupes (now without reference)
USING  blacklist b
WHERE  b.transcription_id = t.transcription_id;

由于缺乏定义，我选择了每组ID最小的行作为尚存的主行。

除非您具有非默认设置，否则

FK约束不会妨碍您。详细说明：

在删除了重复对象之后，您现在可能想要添加UNIQUE约束以防止再次发生相同的错误：

ALTER TABLE transcription
ADD CONSTRAINT transcription_uni UNIQUE (text, citation);

从表中删除重复项，然后将引用行重新链接到新的主表

2 个答案: