计算客户之间共享的订单数量

时间:2017-03-01 20:39:08

标签: sql r dplyr plyr sqldf

我有一个有两列的表

Order | CustomerID

 1. A | C1 
 2. B | C1 
 3. C | C1 
 4. D | C2 
 5. B | C3 
 6. C | C3
 7. D | C4

这是一张很长的桌子。我想要一个显示

的输出
C1 | C3 | 2  #Customer C1 and Customer C3 share 2 orders (i.e. orders, B & C) 
C1 | C2 | 0   #Customer C1 and Customer C2 share 0 orders 
C2 | C4 | 1   #Customer C2 and Customer C4 share 1 orders (i.e. order, D)
C2 | C3 | 0  Customer C2 and Customer C3 share 0 orders  

4 个答案:

答案 0 :(得分:5)

select 
    a.CustomerId
  , b.CustomerId
  , sum(case when a.[Order] = b.[Order] then 1 else 0 end) as SharedOrders
from t as a
  inner join t as b
    on a.CustomerId < b.CustomerId
group by a.CustomerId, b.CustomerId

测试设置:http://rextester.com/ISSCL35174

返回:

+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1         | C2         |            0 |
| C1         | C3         |            2 |
| C2         | C3         |            0 |
| C1         | C4         |            0 |
| C2         | C4         |            1 |
| C3         | C4         |            0 |
+------------+------------+--------------+

只返回共享订单:

select a.CustomerId
     , b.CustomerId
     , count(*) as SharedOrders
from t as a
  inner join t as b
    on a.CustomerId < b.CustomerId
   and a.[Order] = b.[Order]
group by a.CustomerId, b.CustomerId

返回:

+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1         | C3         |            2 |
| C2         | C4         |            1 |
+------------+------------+--------------+

答案 1 :(得分:4)

以下是使用tablecrossprodcombn和矩阵子集的基本R方法。

# get counts of customer IDs
myMat <- crossprod(with(df, table(Order, CustomerID)))
myMat
          CustomerID
CustomerID C1 C2 C3 C4
        C1  3  0  2  0
        C2  0  1  0  1
        C3  2  0  2  0
        C4  0  1  0  1

请注意,对角线是每个客户的总订单数,而(对称)关闭对角线是每个客户共享的订单数。

# get all customer pairs
customers <- t(combn(rownames(myMat), 2))

# use matrix subsetting to pull out order counts and cbind.data.frame to put it together
cbind.data.frame(customers, myMat[customers])
   1  2 myMat[customers]
1 C1 C2                0
2 C1 C3                2
3 C1 C4                0
4 C2 C3                0
5 C2 C4                1
6 C3 C4                0

如果需要,可以在setNames中使用wrap this来添加名称以提供特定的变量名称

setNames(cbind.data.frame(customers, myMat[customers]), c("c1", "c2", "counts"))

数据

df <- 
structure(list(Order = c("A", "B", "C", "D", "B", "C", "D"), 
    CustomerID = c("C1", "C1", "C1", "C2", "C3", "C3", "C4")), .Names = c("Order", 
"CustomerID"), class = "data.frame", row.names = c(NA, -7L))

答案 2 :(得分:1)

SQL Server演示(但代码是通用的):

; with data as (select 'A' as [Order], 'C1' as CustomerID 
                union all 
                select 'B', 'C1'
                union all 
                select 'C', 'C1'
                union all 
                select 'D', 'C2'
                union all 
                select 'B', 'C3'
                union all 
                select 'C', 'C3'
                union all 
                select 'D', 'C4'
        )
select c1, c2, count(*) from (
select x.[Order], x.CustomerID c1, y.CustomerID c2
from data x join data y on x.[Order] = y.[Order] and x.CustomerID < y.CustomerID
) temp
group by c1, c2

这仅考虑至少共享一个订单的货币对。我认为返回对不共享任何订单会浪费资源。

答案 3 :(得分:1)

我会使用cross join来获取所有客户对,然后使用left join来引入订单。最后一步是聚合:

select c1.CustomerId, c2.CustomerId, count(t2.Order) as inCommon
from (select distinct CustomerID from t) c1 cross join
     (select distinct CustomerID from t) c2 left join
     t t1
     on t1.CustomerId = c1.CustomerId left join
     t t2
     on t2.CustomerId = c2.CustomerId and
        t2.Order = t1.Order
where c1.CustomerId < c2.CustomerId
group by c1.CustomerId, c2.CustomerId;

这个过程有点棘手,因为你想要没有共同订单的对。