好的,所以我有一个很好的小查询返回得分结果。该查询目前基于LIKE
,我想将其转换为everyone keeps告诉我的全文查询。如果不是相同的分数,我想得到相同的结果顺序。我能够获得任何接近的唯一方法是展开我的交叉连接...
如何使我的原始查询以全文为基础并保持相对较小且有条理?
您可以在SQLFiddle上看到这两个查询。
原始查询 - 很好,很小,分数和搜索字词都是一次性的
SELECT
sum(score * multiplier) score,
a.id,
a.title
FROM
(
SELECT 3 score, 'a railway employee' term UNION ALL
SELECT 2 score, 'railway employee' term UNION ALL
SELECT 2 score, 'a railway' term UNION ALL
SELECT 1 score, 'employee' term UNION ALL
SELECT 1 score, 'railway' term UNION ALL
SELECT 0 score, 'a' term
) terms
CROSS JOIN
(
SELECT 'T' TYPE, 1 multiplier
UNION ALL SELECT 'S', 1.1
UNION ALL SELECT 'C', 1.5
) x
INNER JOIN
(
SELECT id, 'T' TYPE, title SEARCH FROM articles
UNION ALL
SELECT id, 'S' TYPE, summary SEARCH FROM articles WHERE summary <> ''
UNION ALL
SELECT artId, 'C' TYPE, content SEARCH FROM articleSections
) s ON s.TYPE = x.TYPE AND SEARCH LIKE concat('%', terms.term, '%')
INNER JOIN articles a ON a.id = s.id
WHERE score > 0
GROUP BY id, title
ORDER BY score DESC, title;
;
全文 - 凌乱而大,分数和搜索字词都到处都是
SELECT
sum(score * multiplier) score,
id,
title
FROM
(
SELECT
3 score,
1 multiplier,
'T' AS loc,
id,
title
FROM articles
WHERE MATCH(title) AGAINST ('"a railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
2 score,
1 multiplier,
'T' AS loc,
id,
title
FROM articles
WHERE MATCH(title) AGAINST ('"railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
2 score,
1 multiplier,
'T' AS loc,
id,
title
FROM articles
WHERE MATCH(title) AGAINST ('"a railway"' IN BOOLEAN MODE)
UNION ALL
SELECT
1 score,
1 multiplier,
'T' AS loc,
id,
title
FROM articles
WHERE MATCH(title) AGAINST ('railway' IN BOOLEAN MODE)
UNION ALL
SELECT
1 score,
1 multiplier,
'T' AS loc,
id,
title
FROM articles
WHERE MATCH(title) AGAINST ('employee' IN BOOLEAN MODE)
UNION ALL
SELECT
3 score,
1 multiplier,
'S' AS loc,
id,
title
FROM articles
WHERE MATCH(summary) AGAINST ('"a railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
2 score,
1.1 multiplier,
'S' AS loc,
id,
title
FROM articles
WHERE MATCH(summary) AGAINST ('"railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
2 score,
1.1 multiplier,
'S' AS loc,
id,
title
FROM articles
WHERE MATCH(summary) AGAINST ('"a railway"' IN BOOLEAN MODE)
UNION ALL
SELECT
1 score,
1.1 multiplier,
'S' AS loc,
id,
title
FROM articles
WHERE MATCH(summary) AGAINST ('railway' IN BOOLEAN MODE)
UNION ALL
SELECT
1 score,
1.1 multiplier,
'S' AS loc,
id,
title
FROM articles
WHERE MATCH(summary) AGAINST ('employee' IN BOOLEAN MODE)
UNION ALL
SELECT
3 score,
1.5 multiplier,
'C' AS loc,
id,
title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('"a railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
2 score,
1.5 multiplier,
'C' AS loc,
id,
title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('"railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
2 score,
1.5 multiplier,
'C' AS loc,
id,
title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('"a railway"' IN BOOLEAN MODE)
UNION ALL
SELECT
1 score,
1.5 multiplier,
'C' AS loc,
id,
title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('railway' IN BOOLEAN MODE)
UNION ALL
SELECT
1 score,
1.5 multiplier,
'C' AS loc,
id,
title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('employee' IN BOOLEAN MODE)
) t
WHERE score > 0
GROUP BY id, title
ORDER BY score DESC, title;
;
答案 0 :(得分:0)
评论太长了。
显然,您有非常具体的评分需求,既不是搜索的自然语言模式也不是搜索的布尔模式。我想知道MySQL中是否存在一些隐藏机制,它会为您提供搜索关键字匹配列表,然后您可以将其用于评分。我不知道。
如果你有一个大的语料库和相对罕见的单词(意味着你要找的单词在相对较少的文档中),那么你可以使用布尔模式来减少搜索空间。这样的查询看起来像:
select t.id, sum(terms.score * wherefactor.factor)
from (select t.*
. . .
where MATCH(title, summary, content) AGAINST ('railway employee' IN BOOLEAN MODE)
) t left outer join
(SELECT 3 score, 'a railway employee' term UNION ALL
SELECT 2 score, 'railway employee' term UNION ALL
SELECT 2 score, 'a railway' term UNION ALL
SELECT 1 score, 'employee' term UNION ALL
SELECT 1 score, 'railway' term UNION ALL
SELECT 0 score, 'a' term
) terms cross join
(SELECT 'T' as which, 1.0 as factor UNION ALL
SELECT 'S', 1.1 UNION ALL
SELECT 'C', 1.5
) wherefactor
on (case when wherefacctor.which = 'T' then title
when wherefactor.which = 'S' then subject
when wherefactor.which = 'C' then content
end) like concat('%', term, '%')
group by t.id;
这应该为您提供全文搜索的性能以及评分算法的细节。
如果你有一个已知的词典,另一种可能性就是建立一个文档术语表。这样的表对于每个文档都有一行,并且文档中的每个术语都是您关心的(这称为“词典”)。有了这样的数据结构,您就可以自由地实现您选择的任何分数机制。