MYSQL - 将具有多个重复值的行组合在一起并删除重复项

时间:2015-07-08 12:50:06

标签: mysql sql concat group-concat

所以我将我的数据库设置为单个表。在该表中,我收集了源URL和描述(我正在从多个页面中抓取产品描述)。不幸的是,如果有多个段落,我最终会在数据库中为URL /源页面添加多行。

我想要做的是,如果有多个行具有相同的网址,请合并每行的说明,然后删除该网址的重复行。

我的表格结构如此:

table             
+----+----------------------------+-------------+
| id | url                        | description |
+----+----------------------------+-------------+
|  1 | http://example.com/page-a  | paragraph 1 |
|  2 | http://example.com/page-a  | paragraph 2 |
|  3 | http://example.com/page-a  | paragraph 3 |
|  4 | http://example.com/page-b  | paragraph 1 |
|  5 | http://example.com/page-b  | paragraph 2 |
+----+----------------------------+-------------+

我希望如何:

table             
+----+----------------------------+-------------------------------------+
| id | url                        | description                         |
+----+----------------------------+-------------------------------------+
|  1 | http://example.com/page-a  | paragraph 1 paragraph 2 paragraph 3 |
|  2 | http://example.com/page-b  | paragraph 1 paragraph 2             |
+----+----------------------------+-------------------------------------+

我并不是因为更新的ID是正确的而烦恼,我只是希望能够将段落应该在同一个字段中的行组合起来,因为它们是相同的URL,然后删除重复项。

非常感谢任何帮助!

3 个答案:

答案 0 :(得分:2)

过滤表格很简单,只需将结果插入新表格中即可:

SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
FROM `table`
GROUP BY url

答案 1 :(得分:1)

创建一个新的临时表,截断原始表,然后重新插入数据:

create temporary table tempt as
    select (@rn := @rn + 1) as id, url,
           group_concat(description order by id separator ' ') as description
    from t cross join (select @rn := 0) params
    group by url 
    order by min(id);

-- Do lots of testing and checking here to be sure you have the data you want.

truncate table t;

insert into t(id, url, description)
    select id, url, description
    from tempt;

如果id已在表格中自动递增,则您无需为其提供值。

答案 2 :(得分:0)

SQL

SELECT MIN(id) as [ID],url, description= STUFF((SELECT '; ' 
+ ic.description FROM dbo.My_Table AS ic
WHERE ic.url= c.url
FOR XML PATH(''), TYPE).value('.','nvarchar(max)'), 1, 2, '')
FROM dbo.My_Table AS c
GROUP BY url
ORDER BY url;