我有一个有两列的表
Order | CustomerID
1. A | C1
2. B | C1
3. C | C1
4. D | C2
5. B | C3
6. C | C3
7. D | C4
这是一张很长的桌子。我想要一个显示
的输出C1 | C3 | 2 #Customer C1 and Customer C3 share 2 orders (i.e. orders, B & C)
C1 | C2 | 0 #Customer C1 and Customer C2 share 0 orders
C2 | C4 | 1 #Customer C2 and Customer C4 share 1 orders (i.e. order, D)
C2 | C3 | 0 Customer C2 and Customer C3 share 0 orders
答案 0 :(得分:5)
select
a.CustomerId
, b.CustomerId
, sum(case when a.[Order] = b.[Order] then 1 else 0 end) as SharedOrders
from t as a
inner join t as b
on a.CustomerId < b.CustomerId
group by a.CustomerId, b.CustomerId
测试设置:http://rextester.com/ISSCL35174
返回:
+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1 | C2 | 0 |
| C1 | C3 | 2 |
| C2 | C3 | 0 |
| C1 | C4 | 0 |
| C2 | C4 | 1 |
| C3 | C4 | 0 |
+------------+------------+--------------+
只返回共享订单:
select a.CustomerId
, b.CustomerId
, count(*) as SharedOrders
from t as a
inner join t as b
on a.CustomerId < b.CustomerId
and a.[Order] = b.[Order]
group by a.CustomerId, b.CustomerId
返回:
+------------+------------+--------------+
| CustomerId | CustomerId | SharedOrders |
+------------+------------+--------------+
| C1 | C3 | 2 |
| C2 | C4 | 1 |
+------------+------------+--------------+
答案 1 :(得分:4)
以下是使用table
,crossprod
,combn
和矩阵子集的基本R方法。
# get counts of customer IDs
myMat <- crossprod(with(df, table(Order, CustomerID)))
myMat
CustomerID
CustomerID C1 C2 C3 C4
C1 3 0 2 0
C2 0 1 0 1
C3 2 0 2 0
C4 0 1 0 1
请注意,对角线是每个客户的总订单数,而(对称)关闭对角线是每个客户共享的订单数。
# get all customer pairs
customers <- t(combn(rownames(myMat), 2))
# use matrix subsetting to pull out order counts and cbind.data.frame to put it together
cbind.data.frame(customers, myMat[customers])
1 2 myMat[customers]
1 C1 C2 0
2 C1 C3 2
3 C1 C4 0
4 C2 C3 0
5 C2 C4 1
6 C3 C4 0
如果需要,可以在setNames
中使用wrap this来添加名称以提供特定的变量名称
setNames(cbind.data.frame(customers, myMat[customers]), c("c1", "c2", "counts"))
数据强>
df <-
structure(list(Order = c("A", "B", "C", "D", "B", "C", "D"),
CustomerID = c("C1", "C1", "C1", "C2", "C3", "C3", "C4")), .Names = c("Order",
"CustomerID"), class = "data.frame", row.names = c(NA, -7L))
答案 2 :(得分:1)
SQL Server演示(但代码是通用的):
; with data as (select 'A' as [Order], 'C1' as CustomerID
union all
select 'B', 'C1'
union all
select 'C', 'C1'
union all
select 'D', 'C2'
union all
select 'B', 'C3'
union all
select 'C', 'C3'
union all
select 'D', 'C4'
)
select c1, c2, count(*) from (
select x.[Order], x.CustomerID c1, y.CustomerID c2
from data x join data y on x.[Order] = y.[Order] and x.CustomerID < y.CustomerID
) temp
group by c1, c2
这仅考虑至少共享一个订单的货币对。我认为返回对不共享任何订单会浪费资源。
答案 3 :(得分:1)
我会使用cross join
来获取所有客户对,然后使用left join
来引入订单。最后一步是聚合:
select c1.CustomerId, c2.CustomerId, count(t2.Order) as inCommon
from (select distinct CustomerID from t) c1 cross join
(select distinct CustomerID from t) c2 left join
t t1
on t1.CustomerId = c1.CustomerId left join
t t2
on t2.CustomerId = c2.CustomerId and
t2.Order = t1.Order
where c1.CustomerId < c2.CustomerId
group by c1.CustomerId, c2.CustomerId;
这个过程有点棘手,因为你想要没有共同订单的对。