Question

在样本表t0中：

OrderID | ProductID
 0001      1254
 0001      1252
 0002      0038
 0003      1254
 0003      1252
 0003      1432
 0004      0038
 0004      1254
 0004      1252

我需要在一个OrderID下找到两个ProductID的最流行组合。目的是决定哪些产品更有可能在一个订单中一起出售，例如电话 - 免提。我认为逻辑是按OrderID分组，计算productID对的每个可能组合，按OrderID计算它们并选择TOP 2，但我真的无法判断它是否可行..

Answer 1

A＆＃34;自我加入＆＃34;可以使用，但确保其中一个产品ID比另一个更大，以便我们得到＆＃34;对＆＃34;每个订单的产品。然后很容易计算：

Demo

CREATE TABLE OrderDetail
    ([OrderID] int, [ProductID] int)
;

INSERT INTO OrderDetail
    ([OrderID], [ProductID])
VALUES
    (0001, 1254), (0001, 1252), (0002, 0038), (0003, 1254), (0003, 1252), (0003, 1432), (0004, 0038), (0004, 1254), (0004, 1252)
;

查询1 ：

select -- top(2)
      od1.ProductID, od2.ProductID, count(*) count_of
from OrderDetail od1
inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
group by
      od1.ProductID, od2.ProductID
order by
      count_of DESC

<强> Results ：

| ProductID | ProductID | count_of |
|-----------|-----------|----------|
|      1252 |      1254 |        3 |
|      1252 |      1432 |        1 |
|      1254 |      1432 |        1 |
|        38 |      1252 |        1 |
|        38 |      1254 |        1 |

----

关于显示＆＃34;前2＆＃34;管他呢。你可能会得到同等的顶级＆＃34;结果所以我建议你需要使用dense_rank()，你甚至可能想要＆＃34; unpivot＆＃34;结果所以你有一列productids及其相关的排名。你经常这样做和/或存储这个，我留给你。

with ProductPairs as (
      select 
             p1, p2, count_pair
          , dense_rank() over(order by count_pair DESC) as ranked
      from (
            select
                  od1.ProductID p1, od2.ProductID p2, count(*) count_pair
            from OrderDetail od1
            inner join OrderDetail od2 on od1.OrderID = od2.OrderID and od2.ProductID > od1.ProductID
            group by
                  od1.ProductID, od2.ProductID
            ) d
      )
, RankedProducts as (
       select p1 as ProductID, ranked, count_pair
       from ProductPairs
       union all
       select p2 as ProductID, ranked, count_pair
       from ProductPairs
       )
select *
from RankedProducts
where ranked <= 2
order by ranked, ProductID

Answer 2

尝试使用以下commnand：

SELECT T1.orderID,T1.productId,T2.productID,Count(*) as Occurence
FROM TBL T1 INNER JOIN TBL T2
ON T1.orderid = T2.orderid
WHERE t1.productid > T2.productId
GROUP BY T1.orderID,T1.productId,T2.productID
ORDER BY Occurence DESC

SQL fiddle

Answer 3

  WITH products as (
       SELECT DISTINCT ProductID
       FROM orders
  ),  permutation as (
      SELECT p1.ProductID as pidA, 
             p2.ProductID as pidB
      FROM products p1
      JOIN products p2
        ON p1.ProductID < p2.ProductID
  ), check_frequency as (
      SELECT pidA, pidB, COUNT (o2.orderID) total_orders
      FROM permutations p
      LEFT JOIN orders o1
        ON p.pidA = o1.ProductID
      LEFT JOIN orders o2
        ON p.pidB = o2.ProductID
       AND o1.orderID = o2.orderID
      GROUP BY pidA, pidB
  )
  SELECT TOP 2 *
  FROM check_frequency
  ORDER BY total_orders DESC

Answer 4

以下查询计算双向组合的数量在订单行中的所有订单中：

SELECT SUM(numprods * (numprods - 1)/2) as numcombo2 
FROM ( SELECT orderid, COUNT(DISTINCT productid) as numprods
      FROM orderline ol 
      GROUP BY orderid ) o

请注意，此查询会计算不同的产品而非订单行，因此多行上具有相同产品的订单不会影响计数。双向组合的数量是185,791。这很有用，因为组合的数量几乎决定了查询生成的速度他们跑。拥有大量产品的单一订单可以认真对待降低性能。例如，如果一个订单包含一千个产品，将有大约五十万双向组合在这一个订单中 - 与所有订单数据中的185,791相比。作为数量最大订单的产品增加，组合数量增加更快。指向条件：

该对中的两个产品不同
没有两种组合具有相同的两种产品。

计算组合的方法是在Orderline上进行自联接表，删除了重复的产品对。目标是获得所有对制品通过过滤掉两个产品中的任何对，可以轻松满足第一个条件是平等的。通过要求，第二个条件也很容易满足第一个产品ID小于第二个产品ID。以下查询生成子查询中的所有组合并计算订单数包含每一个：

SELECT p1, p2, COUNT(*) as numorders
FROM (SELECT op1.orderid, op1.productid as p1, op2.productid as p2
FROM (SELECT DISTINCT orderid, productid FROM orderline) op1 JOIN
(SELECT DISTINCT orderid, productid FROM orderline) op2
ON op1.orderid = op2.orderid AND
op1.productid < op2.productid
) combinations
GROUP BY p1, p2

source Data Analysis Using SQL and Excel

表列之间的一对多关系。分组和查找组合

4 个答案:

----