在自定义条件下清除重复行

时间:2014-03-05 14:50:09

标签: mysql sql

我有这个问题管理笔记。我从策略开始始终INSERT新笔记和SELECT最后一笔。请不要笑,我一定认为这是一个好主意,但是现在,系统甚至没有完全生产,并且在大约一个月内插入了300k行。两年后,我的系统将失败。我需要合并重复的行。这是我notes表的结构:

CREATE TABLE IF NOT EXISTS `ps_notes` (
  `CodeNTE` int(11) NOT NULL AUTO_INCREMENT,
  `CodePRS` int(11) NOT NULL,
  `CodeXYZ` int(11) NOT NULL,
  `Type` char(3) NOT NULL,
  `Focus` char(3) NOT NULL,
  `Texte` tinytext NOT NULL,
  `Date` datetime NOT NULL,
  PRIMARY KEY (`CodeNTE`),
  KEY `CodeXYZ` (`CodeXYZ`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=335068 ;

备注可能与某个人CodePRS有关,必须与TypeFocusCodeXYZ相关。他们有一个Texte条目,有时我想知道Date

CodeXYZ是附加注释的实体的唯一标识符。此标识符可以来自任何表,因此不是绝对唯一的,因此Type字段。该字段指定父行来自哪个表。 focus字段区分引用相同CodeXYZType的注释。

这里有一些示例行:

+---------+------+-------+-------------+------------+
| CodeXYZ | Type | Focus |    Texte    |    Date    |
+---------+------+-------+-------------+------------+
| 30008   | ctr  | adm   | Whatever    | 2013-05-09 |
| 30008   | ctr  | adm   | Whatever    | 2013-06-10 |
| 30008   | ctr  | adm   | Lorem ipsum | 2013-06-11 |
| 30008   | ctr  | clt   | He's cool   | 0000-00-00 |
| 2546    | ctr  | sup   | Another     | 2013-02-11 |
| 2546    | ctr  | sup   | Another     | 2013-02-11 |
| 2546    | ctr  | sup   | Another     | 2013-02-19 |
+---------+------+-------+-------------+------------+

这是我想要的输出:

+---------+------+-------+-------------+-----------------------------------------+
| CodeXYZ | Type | Focus |    Texte    |                  Date                   |
+---------+------+-------+-------------+-----------------------------------------+
| 30008   | ctr  | adm   | Lorem ipsum | 2013-06-11 (I want the most recent one) |
| 30008   | ctr  | clt   | He's cool   | 0000-00-00                              |
| 2546    | ctr  | sup   | Another     | 2013-02-11                              |
| 2546    | ctr  | sup   | Another     | 2013-02-19                              |
+---------+------+-------+-------------+-----------------------------------------+

合并的条件

  1. CodeXYZ不是'sup'时,我想要合并具有相同TypeFocusFocus的行。
  2. Focus为'sup'时,我想要合并具有相同CodeXYZTypeFocusDate
  3. 的行
  4. 我总是希望保留最新的
  5. 所以我运行此查询来合并临时表中的行:

    INSERT INTO notes_tmp (CodePRS,CodeXYZ,Type,Focus,Texte,Date)
      SELECT CodePRS,CodeXYZ,Type,Focus,Texte,Date 
      FROM notes 
      GROUP BY CodeXYZ,Type,Focus
    

    但是这样,即使是最后一行也会合并所有行。

    所以我想到了这个:

    INSERT INTO notes_tmp (CodePRS,CodeXYZ,Type,Focus,Texte,Date)
      SELECT CodePRS,CodeXYZ,Type,Focus,Texte,Date 
      FROM notes 
      WHERE Focus<>'sup'
      GROUP BY CodeXYZ,Type,Focus
      ORDER BY Date DESC
    UNION
      SELECT CodePRS,CodeXYZ,Type,Focus,Texte,Date 
      FROM notes
      WHERE Focus='sup'
      GROUP BY CodeXYZ,Type,Focus,Date
      ORDER BY Date DESC
    

    但UNION不在正确的位置,我认为我不能在INSERT INTO ... SELECT sql语法中使用它

    有没有办法管理在单个mysql调用中复制这些行,多个子查询都根据不同的条件在同一个表中结束

2 个答案:

答案 0 :(得分:1)

您可以使用group_concat合并文本字段,并使其他列与group by唯一。试试这个:

INSERT INTO notes_temp
SELECT CodeXYZ,Type, Focus,GROUP_CONCAT(Texte),Date 
FROM notes WHERE Focus = 'sup'
GROUP BY CodeXYZ,Type, Focus,Date;

INSERT INTO notes_temp
SELECT CodeXYZ,Type, Focus,GROUP_CONCAT(Texte),MAX(Date)
FROM notes WHERE Focus <> 'sup'
GROUP BY CodeXYZ,Type, Focus;

检查sqlfiddle

答案 1 :(得分:0)

因此,在@Volkan回答的部分内容中,我可以想出一些奇怪的工作sql来从我的GROUP_CONCAT()中获取正确的音符

案例将获得组concat的最后一个条目。我使用了另一个分隔符(,,,)因为逗号经常在文本中发生。连续三次少一点。

INSERT INTO notes_temp
SELECT CodeXYZ,Type, Focus,Texte,Date 
FROM notes WHERE Focus = 'sup'
GROUP BY CodeXYZ,Type, Focus,Date;

INSERT INTO notes_temp
SELECT 
CodeXYZ,
Type, 
Focus,
CASE
  WHEN COUNT(Texte) > 1
    THEN SUBSTR(GROUP_CONCAT(Texte SEPARATOR ",,,"),((LENGTH(GROUP_CONCAT(Texte SEPARATOR ",,,"))+2) - INSTR(REVERSE(GROUP_CONCAT(Texte SEPARATOR ",,,")),",,,")))
  ELSE
    Texte
  END
AS Texte,
MAX(Date)
FROM notes WHERE Focus <> 'sup'
GROUP BY CodeXYZ,Type, Focus;