在SQL中识别具有最高匹配数的列中的ID对

时间:2017-03-03 23:55:39

标签: mysql sql

我正在尝试使用MySQL找到拥有最多普通客户的业务对。

表格如下:

+------------+------------+ 
| BusinessID | CustomerID |
+------------+------------+
| A          |          1 |
| A          |          2 |
| A          |          3 |
| B          |          4 |
| B          |          1 |
| B          |          3 |
| B          |          2 |
| C          |          3 |
| C          |          4 |
| C          |          5 |
+------------+------------+

我希望输出是业务对和普通客户的数量,如下所示:

+-------------+-------------+------------------------+
| BusinessID  | BusinessID  | Common Customers Count |
+-------------+-------------+------------------------+
| A           | B           |                      3 |
| A           | C           |                      1 |
| B           | C           |                      2 |
+-------------+-------------+------------------------+

这是我写的查询:

SELECT a.BusinessID,b.BusinessID,COUNT(*) AS ncom
FROM (SELECT BusinessID, CustomerID FROM MYTABLE) AS a JOIN       
     (SELECT BusinessID,CustomerID FROM MYTABLE) AS b 
     ON a.BusinessID < b.BusinessID AND a.CustomerID = b.CustomerID
GROUP BY a.BusinessID, b.BusinessID
ORDER BY ncom   

问题是我的数据集有大约5米的行,这对大型数据集来说似乎效率太低。我通过限制数据来测试对较小数据集的查询 - 处理10k行需要8秒,对于20k行需要30秒,因此对于5m行运行此查询是不可行的。我怎样才能编写查询以使其更快?

1 个答案:

答案 0 :(得分:1)

不要使用子查询从表中获取列,这可能会阻止它使用索引。

SELECT a.BusinessID, b.BusinessID, COUNT(*) as ncom
FROM MYTABLE AS a
JOIN MYTABLE AS b ON a.BusinessID < b.BusinessID AND a.CustomerID = b.CustomerID
GROUP BY a.BusinessID, b.BusinessID
ORDER BY ncom

另外,给表格提供以下索引:

CREATE INDEX ix_cust_bus ON MYTABLE (CustomerID, BusinessID);