我将Proximity与Sphinx很好地结合使用,例如Twain NEAR/1 Mark
将返回
马克吐温
和
吐温,马克
但是说我有一个像这样的单词形式:
工作日>工作日
如何设置给定的搜索以使用邻近度NEAR/3
(或NEAR/X
),以便它可以找到
工作日
和
星期几
在这种情况下,我还有其他方法可以给猫剥皮,但总的来说,我希望找到一种不会将多字映射推入'Word1 Word2'
即'Week Day'
的方式,因为否则我会得到文档例如
'我整整工作了一天,才意识到要花上一个
整周”
答案 0 :(得分:0)
没有开箱即用的简便方法。您也许可以在您的应用程序中进行更改,以便它确实将搜索查询中的每个“单词”更改为“单词”〜N,甚至更好的是仅对Sphinx处理的相同单词形式进行更改。这是一个示例:
mysql> select *, weight() from idx_min where match('weekday');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1319 |
+------+-------------------------------------------------------------------------------+------+----------+
3 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"');
+------+---------+------+----------+
| id | doc | a | weight() |
+------+---------+------+----------+
| 1 | Weekday | 1 | 2319 |
+------+---------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2');
+------+-------------+------+----------+
| id | doc | a | weight() |
+------+-------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
+------+-------------+------+----------+
2 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1500 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('weekday full week');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 2439 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.01 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
Empty set (0.00 sec)
最后一个是最好的方法,但是您必须:
1)解析您的查询。例如。像这样:
mysql> call keywords('weekday full week', 'idx_min');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | weekday | week |
| 2 | weekday | day |
| 3 | full | full |
| 4 | week | week |
+------+-----------+------------+
4 rows in set (0.00 sec)
如果您看到对于相同的标记化单词,您将获得2个不同的规范化单词,这可能是您的应用程序将标记化单词包装为“ word”〜N的信号。
2)运行查询。在这种情况下,“工作日”〜整整2周