我正在使用Amazon Redshift。
我在该字符串中有一个列以逗号分隔存储,如Private, Private, Private, Private, Private, Private, United Healthcare
。我想使用query
从中删除重复项,因此结果应为Private, United Healthcare
。我从Stackoverflow中发现了一些明显的解决方案,并且知道可以使用正则表达式。
因此,我尝试过使用:
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\1') AS insurances;
和
SELECT regexp_replace('Private, Private, Private, Private, Private, Private, United Healthcare', '([^,]+)(,\1)+', '\g') AS insurances;
还有其他一些正则表达式,但似乎无效。任何解决方案?
答案 0 :(得分:2)
试试这种方式,
SELECT array_agg(DISTINCT insurances)
FROM (SELECT regexp_split_to_table('Private, Private, Private, Private, Private, Private, United Healthcare'
, ',\s+') AS insurances) x;
替代方式
SELECT DISTINCT UNNEST(regexp_split_to_array('Private, Private, Private, Private, Private, Private, United Healthcare', ',\s+')) AS insurances;
检查http://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html两者都会因红移而失败,其中没有一个将text
转换为text[]
答案 1 :(得分:2)
替代选项是尝试Python UDF。简单的Python函数重复删除字符串并返回正确的版本。
答案 2 :(得分:2)
以下是Amazon Redshift的用户定义函数(UDF):
CREATE FUNCTION f_uniquify (s text)
RETURNS text
IMMUTABLE
AS $$
-- Split string by comma-space, remove duplicates, convert back to comma-separated
return ', '.join(set(s.split(', ')))
$$ LANGUAGE plpythonu;
用以下方法测试:
select f_uniquify('Private, Private, Private, Private, Private, Private, United Healthcare');
返回:
United Healthcare, Private
如果返回值的顺序很重要,则需要更具体的代码。