在BigQuery中对行内的列进行排序

时间:2016-02-03 07:34:58

标签: google-bigquery

我有如下表格

id  a   b   c    
1   2   1   3    
2   3   2   1    
3   16  14  15   
4   10  12  13   
5   15  16  14   
6   10  12  8    

我需要"规范化"这个表通过对a,b,c列中的值进行逐行排序,并将它们重复计算重复次数

预期结果

a   b   c   dups     
1   2   3   2    
14  15  16  2    
10  12  13  1    
8   10  12  1    

我确实有解决方案,但我不知道如何"扩展"当我有超过3列进行标准化时很容易。您可以在下面看到的第一列和最后一列不是问题。当列数> 1时,对于中间的列,东西变得混乱。 3

select a, b, c, count(1) as dups from (
select a1 as a, if(a != a1 and a != c1, a, if(b != a1 and b != c1, b, c)) as b, c1 as c
from (select a, b, c, least(a, b, c) as a1, greatest(a, b, c) as c1 from table)
) group by a, b, c

有人可以提出另一种方法吗?

1 个答案:

答案 0 :(得分:1)

下面的示例适用于4列,可以通过向CONCAT()添加额外的STRING(x)和每个额外列的REGEXP_EXRACT额外行来调整为任意数量的列。

SELECT a, b, c, d, COUNT(1) AS dups 
FROM (
  SELECT id,  
    REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){0}(.*),') AS a, 
    REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){1}(.*),') AS b, 
    REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){2}(.*),') AS c, 
    REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){3}(.*),') AS d
  FROM (
    SELECT id, GROUP_CONCAT(s) AS s FROM (
      SELECT id, s, 
        INTEGER(s) AS e, 
        ROW_NUMBER() OVER(PARTITION BY id ORDER BY e) pos
      FROM (
        SELECT id,  
          SPLIT(CONCAT(STRING(a),',',STRING(b),',',STRING(c),',',STRING(d))) AS s 
        FROM table
      ) ORDER BY id, pos
    ) GROUP BY id
  )
) GROUP BY a, b, c, d