我有一张包含以下数据的表格:
User# App
1 A
1 B
2 A
2 B
3 A
我想知道不同用户之间的应用程序重叠,所以我的最终结果看起来像这样
App1 App2 DistinctUseroverlapped
A A 3
A B 2
B B 2
那么结果意味着有3个用户只使用app A,有2个用户同时使用App A和App B,而且有2个用户只使用App B.
请记住,有很多应用和用户如何在SQL中执行此操作?
答案 0 :(得分:2)
我的解决方案首先生成所有可能感兴趣的应用程序对。这是driver
子查询。
然后它会加入每个应用的原始数据。
最后,它使用count(distinct)
来计算两个列表之间匹配的不同用户。
select pairs.app1, pairs.app2,
COUNT(distinct case when tleft.user = tright.user then tleft.user end) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
from (select distinct app
from t
) t1 cross join
(select distinct app
from t
) t2
where t1.app <= t2.app
) pairs left outer join
t tleft
on tleft.app = pairs.app1 left outer join
t tright
on tright.app = pairs.app2
group by pairs.app1, pairs.app2
您可以将count
中的条件比较移至联接,然后使用count(distinct)
:
select pairs.app1, pairs.app2,
COUNT(distinct tleft.user) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
from (select distinct app
from t
) t1 cross join
(select distinct app
from t
) t2
where t1.app <= t2.app
) pairs left outer join
t tleft
on tleft.app = pairs.app1 left outer join
t tright
on tright.app = pairs.app2 and
tright.user = tleft.user
group by pairs.app1, pairs.app2
我更喜欢第一种方法,因为它更明确地计算了什么。
这是标准SQL,因此它应该适用于Vertica。
答案 1 :(得分:0)
这适用于vertica 6
with tab as
( select 1 as user,'A' as App
union select 1 as user,'B' as App
union select 2 as user,'A' as App
union select 2 as user,'B' as App
union select 3 as user,'A' as App
)
, apps as
( select distinct App from tab )
select apps.app as APP1,tab.app as APP2 ,count(distinct tab.user) from tab,apps
where tab.app>=apps.app
group by 1,2
order by 1