强制接近搜索成多个单词的字形?

时间:2019-05-21 16:56:13

标签: sphinx proximity

我将Proximity与Sphinx很好地结合使用,例如Twain NEAR/1 Mark将返回

马克吐温

吐温,马克

但是说我有一个像这样的单词形式:

工作日>工作日

如何设置给定的搜索以使用邻近度NEAR/3(或NEAR/X),以便它可以找到

工作日

星期几

在这种情况下,我还有其他方法可以给猫剥皮,但总的来说,我希望找到一种不会将多字映射推入'Word1 Word2''Week Day'的方式,因为否则我会得到文档例如

'我整整工作了一天,才意识到要花上一个

整周”

1 个答案:

答案 0 :(得分:0)

没有开箱即用的简便方法。您也许可以在您的应用程序中进行更改,以便它确实将搜索查询中的每个“单词”更改为“单词”〜N,甚至更好的是仅对Sphinx处理的相同单词形式进行更改。这是一个示例:

mysql> select *, weight() from idx_min where match('weekday');
+------+-------------------------------------------------------------------------------+------+----------+
| id   | doc                                                                           | a    | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
|    1 | Weekday                                                                       |    1 |     2319 |
|    2 | day of week                                                                   |    2 |     1319 |
|    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     1319 |
+------+-------------------------------------------------------------------------------+------+----------+
3 rows in set (0.00 sec)

mysql> select *, weight() from idx_min where match('"weekday"');
+------+---------+------+----------+
| id   | doc     | a    | weight() |
+------+---------+------+----------+
|    1 | Weekday |    1 |     2319 |
+------+---------+------+----------+
1 row in set (0.00 sec)

mysql> select *, weight() from idx_min where match('"weekday"~2');
+------+-------------+------+----------+
| id   | doc         | a    | weight() |
+------+-------------+------+----------+
|    1 | Weekday     |    1 |     2319 |
|    2 | day of week |    2 |     1319 |
+------+-------------+------+----------+
2 rows in set (0.00 sec)

mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
+------+-------------------------------------------------------------------------------+------+----------+
| id   | doc                                                                           | a    | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
|    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     1500 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.00 sec)

mysql> select *, weight() from idx_min where match('weekday full week');
+------+-------------------------------------------------------------------------------+------+----------+
| id   | doc                                                                           | a    | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
|    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     2439 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.01 sec)

mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
Empty set (0.00 sec)

最后一个是最好的方法,但是您必须:

1)解析您的查询。例如。像这样:

mysql> call keywords('weekday full week', 'idx_min');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | weekday   | week       |
| 2    | weekday   | day        |
| 3    | full      | full       |
| 4    | week      | week       |
+------+-----------+------------+
4 rows in set (0.00 sec)

如果您看到对于相同的标记化单词,您将获得2个不同的规范化单词,这可能是您的应用程序将标记化单词包装为“ word”〜N的信号。

2)运行查询。在这种情况下,“工作日”〜整整2周