SQL / Vertica - 对多属性组合进行分组

时间:2016-09-27 13:57:33

标签: sql group-by vertica

我有以下类型的数据集:

user_id   country1  city1      country2  city2
1         usa       new york   france    paris 
2         usa       dallas     japan     tokyo 
3         india     mumbai     italy     rome 
4         france    paris      usa       new york 
5         brazil    sao paulo  russia    moscow 

我想将country1city1country2city2的组合分组到哪个顺序(country1country2 )无所谓。通常,我会尝试:

SELECT   country1 
       , city1
       , country2
       , city2 
       , COUNT(*) 
FROM dataset
GROUP BY country1 
       , city1
       , country2
       , city2 

但是,此代码段将user_id=1user_id=4的行视为两个独立的情况,我希望它们被视为等效。

任何知道如何解决此问题的人?

提前致谢!

1 个答案:

答案 0 :(得分:1)

通常,您使用least()greatest()来解决此类问题,但您有两列,而不是一列。那么,让我们通过比较城市来做到这一点。我猜测citycountry更独特:

select (case when city1 < city2 then country1 else country2 end) as country1,
       (case when city1 < city2 then city1 else city2 end) as city1,
       (case when city1 < city2 then country2 else country1 end) as country2,
       (case when city1 < city2 then city2 else city1 end) as city2,
       count(*)
from dataset
group by (case when city1 < city2 then country1 else country2 end),
       (case when city1 < city2 then city1 else city2 end),
       (case when city1 < city2 then country2 else country1 end),
       (case when city1 < city2 then city2 else city1 end)