Sql从逗号分隔的字符串中删除重复项

时间:2018-03-19 11:10:12

标签: sql sql-server tsql split

我想在sql-server中进行查询,这可以使得以下输出像column_A.Columns中的列_B一样是varchar类型。

  Column_A                                 column_B
  karim,karim,rahim,masud,raju,raju        karim,rahim,masud,raju
  jon,man,jon,kamal,kamal                  jon,man,kamal
  c,abc,abc,pot                            c,abc,pot

2 个答案:

答案 0 :(得分:2)

首先:你在评论中被告知,这是一个非常糟糕的设计(违反1.NF)!如果你有机会改变这一点,你真的应该...... 永远不要在一个单元格中存储多个值!

如果你必须坚持这个(或为了修复这个烂摊子),你可以这样:

这是我能想到的最简单的方法:将CSV转换为XML并调用XQuery - 函数distinct-values()

DECLARE @tbl TABLE(ColumnA VARCHAR(MAX));
INSERT INTO @tbl VALUES
 ('karim,karim,rahim,masud,raju,raju')
,('jon,man,jon,kamal,kamal')
,('c,abc,abc,pot');

WITH Splitted AS
(
    SELECT ColumnA 
          ,CAST('<x>' + REPLACE(ColumnA,',','</x><x>') + '</x>' AS XML) AS TheParts
    FROM @tbl 
)
SELECT ColumnA
      ,TheParts.query('distinct-values(/x/text())').value('.','varchar(250)') AS ColumnB
FROM Splitted;

结果

ColumnA                             ColumnB
karim,karim,rahim,masud,raju,raju   karim rahim masud raju
jon,man,jon,kamal,kamal             jon man kamal
c,abc,abc,pot                       c abc pot

更新保留逗号

WITH Splitted AS
(
    SELECT ColumnA 
          ,CAST('<x>' + REPLACE(ColumnA,',','</x><x>') + '</x>' AS XML) AS TheParts
    FROM @tbl 
)
SELECT ColumnA
      ,STUFF(
          (TheParts.query
          ('
          for $x in distinct-values(/x/text())
            return <x>{concat(",", $x)}</x>
          ').value('.','varchar(250)')),1,1,'') AS ColumnB
FROM Splitted;

结果

ColumnB
karim,rahim,masud,raju
jon,man,kamal
c,abc,pot

答案 1 :(得分:0)

SQL 从逗号分隔的字符串中删除重复项:

伪代码:创建一个 postgresql 函数,接收逗号分隔的字符串作为输入,并在内存中创建另一个数组。用逗号分割字符串,修剪空格并枚举每个项目,如果该项目没有出现在新列表中,则添加它。最后将新数组展平为字符串并返回。

drop function if exists remove_duplicates_from_comma_separated_string(text);

CREATE or replace FUNCTION remove_duplicates_from_comma_separated_string(arg1 text) 
RETURNS text language plpgsql AS $$ declare 
  item text;  
  split_items text[];  
  ret_items text[];  
  ret_val text; 
BEGIN 
  --split your string on commas and trim whitespace 
  split_items := string_to_array(ltrim(arg1), ','); 
  --enumerate each item, if it doesn't exist in the new array then add it. 
  FOREACH item IN ARRAY split_items LOOP 
    if ( item::text = ANY(ret_items)) then 
    else 
        --append this unique item into ret_items 
        select array_append(ret_items, ltrim(item)) into ret_items; 
    end if;  
  END LOOP; 
  --flatten the final array to a text with comma delimiter 
  SELECT array_to_string(ret_items, ',', '*') into ret_val; 
  return ret_val; 
END; $$;

所以现在我们可以这样调用表上的函数:

drop table if exists foo_table; 
create table foo_table(name text); 
insert into foo_table values('karim,karim,rahim,masud,raju,raju'); 
insert into foo_table values('jon,man,jon,kamal,kamal'); 
insert into foo_table values('jon,man,kamal'); 
insert into foo_table values('c,abc,poty'); 
insert into foo_table values('c,abc,abc,kotb'); 
select remove_duplicates_from_comma_separated_string(name) from foo_table; 

打印:

┌───────────────────────────────────────────────┐ 
│ remove_duplicates_from_comma_separated_string │ 
├───────────────────────────────────────────────┤ 
│ karim,rahim,masud,raju                        │ 
│ jon,man,kamal                                 │ 
│ jon,man,kamal                                 │ 
│ c,abc,poty                                    │ 
│ c,abc,kotb                                    │ 
└───────────────────────────────────────────────┘ 

代码气味 haaax 系数:9.5 of 10。施工人员看着新手程序员用 90 美元的 sql 牌管扳手敲钉子,每个人都翻白眼。