Question

我有一个“集合”的数据集，或者我们称之为群组或愿望清单...... 集合是项目列表

    collectionId |  itemdId
---------------------------------

        123      |    2345
        123      |    3465
        123      |    876
        123      |    567            
        123      |    980

        777      |    980
        777      |    332
        777      |    3465
        777      |    876
        777      |    678
        777      |    567

你看到第876和980项，包含在两个系列中（777和123）所以它们是一对受欢迎的情侣/一对

我的用户生成这些集合，我很想提取两个见解：

这是最常见的项目（这很容易）

这是最常见的一对/一对（或超过2个）项目（这是我的问题）

例如

说许多愿望清单包含iphone和粉红色iphone封面除了其他配件，但我想提取其实iphone + 粉红色的iphone封面是一种常见的“情侣”

总而言之，基本上我正在尝试做亚马逊所做的事情，如果你看到iphone我想建议你一个粉红色的iphone封面，因为很多其他用户建议/偏爱

我是否必须首先比较集合之间的相似性？看看他们有多少共同的物品？比率与指数的相似度？

使用mysql的最佳方法是什么？我也需要PHP吗？

更新

在PHP中

我可能会做一些像伪代码一样循环的东西

for total number of collection:

select all item from collection 1

  select all item from collection 2
  do array_interesct (c1,c2)
  store the matching items
  repeat...

  select all item from collection 2
  do array_interesct (c1,c3)
  store the matching items
  repeat...

...then elect all item from collection 2 and repeat all the iterations..

Answer 1

对于两个集合，您可以进行联接

select a.itemID
from my_table a
join my_table b on a.itemID = b.ItemID
where a.collection = 123
and b.collection = 777

所有你可以尝试使用笛卡尔积（对于第二对表）.. for（3 ..3）

select a.itemID
from my_table a
cross join my_table b 
where a.item = b.item
and a.collection <>  b.collection

MYSQL / PHP在集合/组的数据集中查找常见的夫妻

1 个答案: