我有如下表格
id a b c
1 2 1 3
2 3 2 1
3 16 14 15
4 10 12 13
5 15 16 14
6 10 12 8
我需要"规范化"这个表通过对a,b,c列中的值进行逐行排序,并将它们重复计算重复次数
预期结果
a b c dups
1 2 3 2
14 15 16 2
10 12 13 1
8 10 12 1
我确实有解决方案,但我不知道如何"扩展"当我有超过3列进行标准化时很容易。您可以在下面看到的第一列和最后一列不是问题。当列数> 1时,对于中间的列,东西变得混乱。 3
select a, b, c, count(1) as dups from (
select a1 as a, if(a != a1 and a != c1, a, if(b != a1 and b != c1, b, c)) as b, c1 as c
from (select a, b, c, least(a, b, c) as a1, greatest(a, b, c) as c1 from table)
) group by a, b, c
有人可以提出另一种方法吗?
答案 0 :(得分:1)
下面的示例适用于4列,可以通过向CONCAT()添加额外的STRING(x)和每个额外列的REGEXP_EXRACT额外行来调整为任意数量的列。
SELECT a, b, c, d, COUNT(1) AS dups
FROM (
SELECT id,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){0}(.*),') AS a,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){1}(.*),') AS b,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){2}(.*),') AS c,
REGEXP_EXTRACT(s + ',', r'(?U)^(?:.*,){3}(.*),') AS d
FROM (
SELECT id, GROUP_CONCAT(s) AS s FROM (
SELECT id, s,
INTEGER(s) AS e,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY e) pos
FROM (
SELECT id,
SPLIT(CONCAT(STRING(a),',',STRING(b),',',STRING(c),',',STRING(d))) AS s
FROM table
) ORDER BY id, pos
) GROUP BY id
)
) GROUP BY a, b, c, d