Question

我有一张表记录了从客户那里购买的商品，每次购买都一行：

CustomerID  |  ProductID 
1           |  1000 
1           |  2000 
1           |  3000 
2           |  1000 
3           |  1000 
3           |  3000 
...         |  ...

我正在使用以下代码来查找与客户＃1重叠产品数量最多的十个客户（第一个结果是重叠量最大的一个，等等）：

SELECT othercustomers.CustomerID, COUNT(DISTINCT othercustomers.ProductID)
FROM `purchases` AS thiscustomer
JOIN `purchases` AS othercustomers ON
    thiscustomer.CustomerID != othercustomers.CustomerID
    AND thiscustomer.ProductID = othercustomers.ProductID
WHERE thiscustomer.CustomerID = '1'
GROUP BY othercustomers.CustomerID
ORDER BY COUNT(DISTINCT othercustomers.ProductID) DESC
LIMIT 10

代码产生预期的输出（客户ID +与客户＃1重叠的产品总数）。

我现在希望该查询排除购买了1000多种不同商品的重复购买顾客，因为这些顾客是购买整个股票的大宗购买者，因此在寻找具有相似品味的顾客时他们的购买历史没有意义

换句话说，如果＃500客户购买了1000种以上的不同产品，那么我希望他/她在搜索与＃1客户具有相似口味的客户时将其排除在结果之外-即使＃500客户已经购买了所有产品客户＃1购买了三种产品，通常在相似度/重叠率方面排名第一。

我想有些HAVING是正确的，但是我似乎无法弄清楚什么是合适的条件。

谢谢！

Answer 1

我认为HAVING不会满足您的要求，因为它只会为您提供重叠个产品的总数，而您希望获得其他产品的总数顾客。

您可以在WHERE子句中使用相关子查询进行过滤：

SELECT othercustomers.CustomerID, COUNT(DISTINCT othercustomers.ProductID)
FROM `purchases` AS thiscustomer
JOIN `purchases` AS othercustomers ON
    thiscustomer.CustomerID != othercustomers.CustomerID
    AND thiscustomer.ProductID = othercustomers.ProductID
WHERE 
    thiscustomer.CustomerID = '1'
    AND (
        SELECT COUNT(DISTINCT ProductID) 
        FROM `purchases` AS p
        WHERE p.CustomerID = othercustomers.CustomerID
    ) < 1000
GROUP BY othercustomers.CustomerID
ORDER BY COUNT(DISTINCT othercustomers.ProductID) DESC
LIMIT 10

为了提高性能，您希望在purchases(CustomerID, ProductID)上建立索引。

寻找口味相似的客户，同时排除某些客户

1 个答案: