MySQL找到最相似的用户

时间:2015-11-22 22:03:21

标签: mysql sql

假设我有一张如下表:

Mark - Green
Mark - Blue
Mark - Red
Adam - Yellow
Andrew - Red
Andrew - Green

我的目标是将用户“Mark”与数据库中的所有其他用户进行比较,以找出他最相似的其他用户。在这种情况下,他与安德鲁(2/3比赛)最相似,与亚当(0/3)比赛最不相似。在我发现哪个用户与Mark最相似之后,我想提取Andrew得到的但是Mark没有得到的。

这在MySQL中可行吗?我感谢所有的帮助,谢谢你们!

编辑:所有好帮助都超越了!十分感谢大家!我一定会查看你所有的贡献!

6 个答案:

答案 0 :(得分:1)

以下查询尝试列出与Mark匹配的所有用户。它基本上将表与Mark条目连接起来,并计算所有用户的公共条目。

SELECT ours.user, theirs.user, count(*) as `score`
FROM tableName as `theirs`, (SELECT *
                              FROM tableName
                              WHERE user = 'Mark') as `ours`
WHERE theirs.user != 'Mark' AND
      theirs.color = ours.color
GROUP BY theirs.user
ORDER BY score DESC

但是,如果有重复数据(即一个人选择两次相同的颜色),查询将无法运行。但是,如果您在评论中提到它不会发生,那不应该是一个问题。

可以修改查询以显示所有用户的分数:

SELECT ours.user as `myUser`, theirs.user as `theirUser`, count(*) as `score`
FROM tableName as `ours`, tableName as `theirs`
WHERE theirs.user != ours.user AND
      theirs.color = ours.color
GROUP BY ours.user, theirs.user
ORDER BY score DESC

Q成为上述查询,为您提供最相似的用户。拥有该用户后,您可以使用它来显示它们之间的不同条目。这就是我们要做的事情:

SELECT * 
FROM tableName as theirs
WHERE user = 'Andrew'
AND NOT EXISTS (SELECT 1
                FROM tableName as ours
                WHERE ours.user = 'Mark'
                      AND ours.color = theirs.color)

Andrew替换输入MarkQ

SELECT similar.myUser, theirs.user, theirs.color  
FROM tableName as theirs JOIN (Q) as similar
ON theirs.user = similar.theirUser
WHERE NOT EXISTS (SELECT 1
                  FROM tableName as ours
                  WHERE ours.user = similar.myUser
                        AND ours.color = theirs.color)

Here's the final query up and running。希望这是有道理的。

答案 1 :(得分:0)

使用FULLTEXT INDEXES。您的查询将如下:

SELECT * FROM user WHERE MATCH (name,color) AGAINST ('Mark blue'); 

或者最简单的方法是使用LIKE搜索

 SELECT * FROM user WHERE name LIKE '%Mike%' OR color = 'blue'

您可以选择哪种方式更适合您

答案 2 :(得分:0)

select 
    name,
    sum(case when t2.cnt > t1.cnt then t1.cnt else t2.cnt end) matches
from (
    select name, color, count(*) cnt
    from table
    where name <> 'Mark'
    group by name, color
) t1 left join (
    select color, count(*) cnt
    from table
    where name = 'Mark'
    group by color
) t2 on t2.color = t1.color
group by name
order by matches desc

派生表t1包含每个用户(Mark除外)的颜色数,t2包含与Mark相同的颜色。然后将这些表格连接在颜色上,并取2个计数中较小的一个,即如果Amy有2个红色且Mark有1个红色,则将1作为匹配数。最后按名称分组并返回最大的匹配数。

答案 3 :(得分:0)

这应该让你接近。复杂性来自这样一个事实:您允许每个用户多次选择每种颜色,并要求在您要比较的其他用户中匹配每个相同的对。因此,我们真的很想知道用户每种颜色的总颜色选择数量,以及该数字与相同用户对同一颜色的数量的比较。

首先,我们创建一个派生关系,为我们做简单的数学运算(按颜色计算每个用户的选择数):

 CREATE VIEW UserColorCounts (User, Color, TimesSeen) 
   AS SELECT User, Color, COUNT(*) FROM YourTable GROUP BY User, Color

其次,我们需要某种关系,将主要用户的每个颜色计数与每个次要用户的颜色计数进行比较:

 CREATE VIEW UserColorMatches (User, OtherUser, Color, TimesSeen, TimesMatched)
  AS SELECT P.User, S.User, P.Color, P.TimesSeen, LEAST(P.TimesSeen, S.TimesSeen)
  FROM UserColorCounts P LEFT OUTER JOIN UserColorCounts S
  ON P.Color = S.Color AND P.User <> S.User

最后,我们总计了每个主要用户的颜色计数,并与每个次要用户的匹配颜色计数进行比较:

 SELECT User, OtherUser, SUM(TimesMatched) AS Matched, SUM(TimesSeen) AS OutOf
    FROM UserColorMatches WHERE OtherUser IS NOT NULL
    GROUP BY User, OtherUser

答案 4 :(得分:0)

   select match.name, count(*) as count 
     from table 
     join table as match 
           on match.name <> table.name 
          and table.name = 'mark' 
          and match.color = table.color 
    group by match.name 
    order by count(*) desc 

答案 5 :(得分:0)

以下查询会返回namematching_name之间的匹配分数以及可获得的最高分数,以便您知道匹配所具有的%值。

此代码将color列中的重复值计为只有一个,因此如果您有两次记录Mark - Red,则只会计为1。

select
  foo.name, foo.matching_name, count(*) AS matching_score, goo.color_no AS max_score 
from 
(
  select 
    distinct a.name, a.color, b.name AS matching_name
  from 
    (
      select name, color from yourtable
    ) a
    left join yourtable b on a.color = b.color and a.name <> b.name
  where b.name is not null
) foo
left join ( select name, count(distinct color) AS color_no from yourtable group by name ) goo
  on foo.name = goo.name
group by foo.name, foo.matching_name

附加SQLFiddle以预览输出。