Question

使用Sphinx 2.1.4-id64-dev（rel21-r4324）

我想搜索多个字段，但不希望“重复字词”增加权重。

所以，我使用的是ranker = matchany选项。

当重复项位于单个字段中时，这可以正常工作：

MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('@(val,val2) bar') OPTION ranker=matchany;
+------+---------+------+----------+
| id   | val     | val2 | weight() |
+------+---------+------+----------+
|    3 | bar     |      |        1 |
|    4 | bar bar |      |        1 |
+------+---------+------+----------+
2 rows in set (0.00 sec)

=＆GT;虽然文档4中有重复的单词，但权重是相等的。

但是，当重复项位于多个字段时，这不再起作用了：

MySQL [(none)]> select id, val, val2, weight() FROM nptest WHERE match('@(val,val2) foo') OPTION ranker=matchany;
+------+------+------+----------+
| id   | val  | val2 | weight() |
+------+------+------+----------+
|    2 | foo  | foo  |        2 |
|    1 | foo  |      |        1 |
+------+------+------+----------+
2 rows in set (0.00 sec)

id-2的重量＆gt; id-1的重量

有没有办法在多个字段中应用“matchany”排名模式？

以下是sphinx.conf文件示例：

source nptest
{
        type                    = mysql
        sql_host                = localhost
        sql_user                = myuser
        sql_pass                = mypass
        sql_db                  = test
        sql_port                = 3306

        sql_query               = \
                SELECT 1, 'foo' AS val, '' AS val2 \
                UNION \
                SELECT 2, 'foo', 'foo' \
                UNION \
                SELECT 3, 'bar', '' \
                UNION \
                SELECT 4, 'bar bar', ''

        sql_field_string = val
        sql_field_string = val2
}

index nptest
{
        type                    = plain
        source                  = nptest
        path                    = /var/lib/sphinxsearch/data/nptest
        morphology              = none
}

Answer 1

你需要表达式排名 http://sphinxsearch.com/docs/current.html#weighting

可以从matchany的默认表达式开始并调整它。

使用doc_word_count代替sum(word_count)应该很有用。

Answer 2

升级到Sphinx 2.2.1-id64-beta（r4330）后，我能够在“自定义表达式排名器”中使用top()聚合函数，如下所示：

MySQL [(none)]> SELECT id, val, val2, weight() FROM nptest WHERE match('@(val,val2) foo') OPTION ranker=expr('top((word_count+(lcs-1)*max_lcs)*user_weight)'), field_weights=(val=3,val2=4);
+------+-------------+------+----------+
| id   | val         | val2 | weight() |
+------+-------------+------+----------+
|    2 | foo         | foo  |        4 |
|    1 | foo         |      |        3 |
|    5 | bar bar foo | bar  |        3 |
+------+-------------+------+----------+
3 rows in set (0.00 sec)

这样，多个字段中的多次出现不会增加全局权重，如果字段具有不同的权重，则会采用最高加权字段。

非常感谢barryhunter的大力帮助！

SphinxSearch Ranker = matchany在多个领域

2 个答案: