是否可以从Big Query中的字符串中删除重复项?

时间:2018-09-03 14:38:06

标签: google-bigquery

因此一直在处理一些数据,目前已按照

的行进行输出

客户|原因
客户1 |答案1,答案3,答案2,答案4,答案5,答案1,答案3,答案1

Big Query标准sql中是否存在要消除此字符串中重复项并以下面的输出结尾的内容?

客户|原因
客户1 |答案1,答案3,答案2,答案4,答案5

预先感谢

2 个答案:

答案 0 :(得分:4)

假设我正确理解了这个问题,则需要类似以下内容的

', '

这将在DISTINCT分隔符上拆分字符串,然后将子字符串聚合为一个新字符串,并使用(oal)关键字删除了重复项。

答案 1 :(得分:2)

投票支持Elliott的答案时-想要添加另一个选项(BigQuery标准SQL):

     
#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'Customer1' customer, 'Answer1, Answer3, Answer2, Answer4, Answer5, Answer1, Answer3, Answer1' answers 
)
SELECT * REPLACE(
  ARRAY_TO_STRING(ARRAY(SELECT DISTINCT answer
    FROM UNNEST(SPLIT(answers, ', ')) AS answer 
  ), ', ') AS answers)   
FROM `project.dataset.table`    

产生所需的结果

Row customer    answers  
1   Customer1   Answer1, Answer3, Answer2, Answer4, Answer5   

如果出于某种原因您希望对这些值进行排序-您只需添加如下一行

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'Customer1' customer, 'Answer1, Answer3, Answer2, Answer4, Answer5, Answer1, Answer3, Answer1' answers 
)
SELECT * REPLACE(
  ARRAY_TO_STRING(ARRAY(SELECT DISTINCT answer
    FROM UNNEST(SPLIT(answers, ', ')) AS answer 
    ORDER BY answer
  ), ', ') AS answers)   
FROM `project.dataset.table`     

结果为

Row customer    answers  
1   Customer1   Answer1, Answer2, Answer3, Answer4, Answer5      

注意:最有可能需要订购与您问题中的特定用例无关-在其他情况下可以方便使用