我有以下类型的数据集:
user_id country1 city1 country2 city2
1 usa new york france paris
2 usa dallas japan tokyo
3 india mumbai italy rome
4 france paris usa new york
5 brazil sao paulo russia moscow
我想将country1
,city1
,country2
和city2
的组合分组到哪个顺序(country1
或country2
)无所谓。通常,我会尝试:
SELECT country1
, city1
, country2
, city2
, COUNT(*)
FROM dataset
GROUP BY country1
, city1
, country2
, city2
但是,此代码段将user_id=1
和user_id=4
的行视为两个独立的情况,我希望它们被视为等效。
任何知道如何解决此问题的人?
提前致谢!
答案 0 :(得分:1)
通常,您使用least()
和greatest()
来解决此类问题,但您有两列,而不是一列。那么,让我们通过比较城市来做到这一点。我猜测city
比country
更独特:
select (case when city1 < city2 then country1 else country2 end) as country1,
(case when city1 < city2 then city1 else city2 end) as city1,
(case when city1 < city2 then country2 else country1 end) as country2,
(case when city1 < city2 then city2 else city1 end) as city2,
count(*)
from dataset
group by (case when city1 < city2 then country1 else country2 end),
(case when city1 < city2 then city1 else city2 end),
(case when city1 < city2 then country2 else country1 end),
(case when city1 < city2 then city2 else city1 end)