我有一个城市和州的数据库(约43,000)。我这样做全文搜索:
select city, state, match(city, state_short, state) against (:q in boolean mode) as score
from zipcodes where
match(city, state_short, state) against (:q in boolean mode)
group by city, state order by score desc limit 6
当我用有意义的字符串替换:q
时,它会起作用,但我可以说我搜索houston texas
,我希望结果是第一个,但它是第3个:
North Houston, Texas
South Houston, Texas
Houston, Texas
如何使Houston, Texas
比其他2重?对于像这样的其他城市来说,这显然应该是一样的。
修改
这有用吗,有什么想法吗?
SELECT * FROM (
SELECT city, state, MATCH(city, state_short, state) AGAINST (:q IN BOOLEAN MODE) as score
FROM zipcodes
WHERE MATCH(city, state_short, state) AGAINST (:q IN BOOLEAN MODE)
GROUP BY city, state
ORDER BY score DESC LIMIT 6
) AS tbl
ORDER BY score DESC, LENGTH(city)
答案 0 :(得分:1)
您的新查询可能有效,但完全是间接的。像ORDER BY LENGTH(city)
这样的东西会更好,而不是ORDER BY ABS(LENGTH(:q) - (LENGTH(city) + LENGTH(state)))
。这并不完美,但它应该更好,因为任何与输入和高分相同长度的东西可能都是你正在寻找的东西。最终的查询看起来像这样:
SELECT city, state, MATCH(city, state_short, state) AGAINST (:q IN BOOLEAN MODE) AS score
FROM zipcodes
WHERE MATCH(city, state_short, state) AGAINST (:q IN BOOLEAN MODE)
GROUP BY city, state
ORDER BY score DESC, ABS(LENGTH(:q) - (LENGTH(city) + LENGTH(state))) DESC LIMIT 6
我将新的ORDER BY
子句移动到主查询中以删除子查询。这应该产生相同(或可能更准确)的结果。
Levenshtein距离可能是一个更准确的衡量标准,但在MySQL中没有本地实现它。 This post提供了有关Levenshtein距离函数的MySQL实现的更多信息。