假设我有一张如下表:
Mark - Green
Mark - Blue
Mark - Red
Adam - Yellow
Andrew - Red
Andrew - Green
我的目标是将用户“Mark”与数据库中的所有其他用户进行比较,以找出他最相似的其他用户。在这种情况下,他与安德鲁(2/3比赛)最相似,与亚当(0/3)比赛最不相似。在我发现哪个用户与Mark最相似之后,我想提取Andrew得到的但是Mark没有得到的。
这在MySQL中可行吗?我感谢所有的帮助,谢谢你们!
编辑:所有好帮助都超越了!十分感谢大家!我一定会查看你所有的贡献!
答案 0 :(得分:1)
以下查询尝试列出与Mark
匹配的所有用户。它基本上将表与Mark
条目连接起来,并计算所有用户的公共条目。
SELECT ours.user, theirs.user, count(*) as `score`
FROM tableName as `theirs`, (SELECT *
FROM tableName
WHERE user = 'Mark') as `ours`
WHERE theirs.user != 'Mark' AND
theirs.color = ours.color
GROUP BY theirs.user
ORDER BY score DESC
但是,如果有重复数据(即一个人选择两次相同的颜色),查询将无法运行。但是,如果您在评论中提到它不会发生,那不应该是一个问题。
可以修改查询以显示所有用户的分数:
SELECT ours.user as `myUser`, theirs.user as `theirUser`, count(*) as `score`
FROM tableName as `ours`, tableName as `theirs`
WHERE theirs.user != ours.user AND
theirs.color = ours.color
GROUP BY ours.user, theirs.user
ORDER BY score DESC
让Q
成为上述查询,为您提供最相似的用户。拥有该用户后,您可以使用它来显示它们之间的不同条目。这就是我们要做的事情:
SELECT *
FROM tableName as theirs
WHERE user = 'Andrew'
AND NOT EXISTS (SELECT 1
FROM tableName as ours
WHERE ours.user = 'Mark'
AND ours.color = theirs.color)
从Andrew
替换输入Mark
和Q
:
SELECT similar.myUser, theirs.user, theirs.color
FROM tableName as theirs JOIN (Q) as similar
ON theirs.user = similar.theirUser
WHERE NOT EXISTS (SELECT 1
FROM tableName as ours
WHERE ours.user = similar.myUser
AND ours.color = theirs.color)
Here's the final query up and running。希望这是有道理的。
答案 1 :(得分:0)
使用FULLTEXT INDEXES。您的查询将如下:
SELECT * FROM user WHERE MATCH (name,color) AGAINST ('Mark blue');
或者最简单的方法是使用LIKE搜索
SELECT * FROM user WHERE name LIKE '%Mike%' OR color = 'blue'
您可以选择哪种方式更适合您
答案 2 :(得分:0)
select
name,
sum(case when t2.cnt > t1.cnt then t1.cnt else t2.cnt end) matches
from (
select name, color, count(*) cnt
from table
where name <> 'Mark'
group by name, color
) t1 left join (
select color, count(*) cnt
from table
where name = 'Mark'
group by color
) t2 on t2.color = t1.color
group by name
order by matches desc
派生表t1
包含每个用户(Mark除外)的颜色数,t2
包含与Mark相同的颜色。然后将这些表格连接在颜色上,并取2个计数中较小的一个,即如果Amy有2个红色且Mark有1个红色,则将1作为匹配数。最后按名称分组并返回最大的匹配数。
答案 3 :(得分:0)
这应该让你接近。复杂性来自这样一个事实:您允许每个用户多次选择每种颜色,并要求在您要比较的其他用户中匹配每个相同的对。因此,我们真的很想知道用户每种颜色的总颜色选择数量,以及该数字与相同用户对同一颜色的数量的比较。
首先,我们创建一个派生关系,为我们做简单的数学运算(按颜色计算每个用户的选择数):
CREATE VIEW UserColorCounts (User, Color, TimesSeen)
AS SELECT User, Color, COUNT(*) FROM YourTable GROUP BY User, Color
其次,我们需要某种关系,将主要用户的每个颜色计数与每个次要用户的颜色计数进行比较:
CREATE VIEW UserColorMatches (User, OtherUser, Color, TimesSeen, TimesMatched)
AS SELECT P.User, S.User, P.Color, P.TimesSeen, LEAST(P.TimesSeen, S.TimesSeen)
FROM UserColorCounts P LEFT OUTER JOIN UserColorCounts S
ON P.Color = S.Color AND P.User <> S.User
最后,我们总计了每个主要用户的颜色计数,并与每个次要用户的匹配颜色计数进行比较:
SELECT User, OtherUser, SUM(TimesMatched) AS Matched, SUM(TimesSeen) AS OutOf
FROM UserColorMatches WHERE OtherUser IS NOT NULL
GROUP BY User, OtherUser
答案 4 :(得分:0)
select match.name, count(*) as count
from table
join table as match
on match.name <> table.name
and table.name = 'mark'
and match.color = table.color
group by match.name
order by count(*) desc
答案 5 :(得分:0)
以下查询会返回name
和matching_name
之间的匹配分数以及可获得的最高分数,以便您知道匹配所具有的%值。
此代码将color
列中的重复值计为只有一个,因此如果您有两次记录Mark - Red
,则只会计为1。
select
foo.name, foo.matching_name, count(*) AS matching_score, goo.color_no AS max_score
from
(
select
distinct a.name, a.color, b.name AS matching_name
from
(
select name, color from yourtable
) a
left join yourtable b on a.color = b.color and a.name <> b.name
where b.name is not null
) foo
left join ( select name, count(distinct color) AS color_no from yourtable group by name ) goo
on foo.name = goo.name
group by foo.name, foo.matching_name
附加SQLFiddle以预览输出。