在SQL中重叠

时间:2013-04-18 21:32:14

标签: sql join vertica

我有一张包含以下数据的表格:

User#       App
1       A
1       B
2       A   
2       B
3       A

我想知道不同用户之间的应用程序重叠,所以我的最终结果看起来像这样

App1  App2  DistinctUseroverlapped 
A     A     3
A     B     2
B     B     2

那么结果意味着有3个用户只使用app A,有2个用户同时使用App A和App B,而且有2个用户只使用App B.

请记住,有很多应用和用户如何在SQL中执行此操作?

2 个答案:

答案 0 :(得分:2)

我的解决方案首先生成所有可能感兴趣的应用程序对。这是driver子查询。

然后它会加入每个应用的原始数据。

最后,它使用count(distinct)来计算两个列表之间匹配的不同用户。

select pairs.app1, pairs.app2,
       COUNT(distinct case when tleft.user = tright.user then tleft.user end) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
      from (select distinct app
            from t
           ) t1 cross join
           (select distinct app
            from t
           ) t2
      where t1.app <= t2.app
     ) pairs left outer join
     t tleft
     on tleft.app = pairs.app1 left outer join
     t tright
     on tright.app = pairs.app2
group by pairs.app1, pairs.app2

您可以将count中的条件比较移至联接,然后使用count(distinct)

select pairs.app1, pairs.app2,
       COUNT(distinct tleft.user) as NumCommonUsers
from (select t1.app as app1, t2.app as app2
      from (select distinct app
            from t
           ) t1 cross join
           (select distinct app
            from t
           ) t2
      where t1.app <= t2.app
     ) pairs left outer join
     t tleft
     on tleft.app = pairs.app1 left outer join
     t tright
     on tright.app = pairs.app2 and
        tright.user = tleft.user
group by pairs.app1, pairs.app2

我更喜欢第一种方法,因为它更明确地计算了什么。

这是标准SQL,因此它应该适用于Vertica。

答案 1 :(得分:0)

这适用于vertica 6

 with tab as 
    ( select 1 as user,'A' as App
    union  select 1 as user,'B' as App
    union select 2 as user,'A' as App
    union select 2 as user,'B' as App
    union select 3 as user,'A' as App
    )
    , apps as 
    ( select distinct App  from tab )
    select apps.app as APP1,tab.app as APP2 ,count(distinct tab.user) from tab,apps
    where tab.app>=apps.app
    group by 1,2
    order by 1