Sphinx忽略排名,总是以相同的顺序排序

时间:2012-07-17 08:01:40

标签: search full-text-search sphinx

我在schools上有sphinx索引,当我进行查询时,我总是以相同的顺序收到相同的结果。我已经尝试了所有可以想象的排名,排序和匹配组合,并且总是得到相同的排序。

我收到的不良数据样本如下:

"albany high"

Albany Junior High School | Auckland, NZ | 2001 (shouldn't be first)
Albany High School        | Albany, NY   | 2001
South Albany High School  | Albany, OR   | 2001
Albany High School        | Albany, CA   | 1001 (shouldn't be last)

正如你所看到的,排名最高的学校不在一个名为“奥尔巴尼”的城市,应该更低,而排名最低的“奥尔巴尼高中”应该排名高于它。许多搜索字词都会复制此问题。

Sphinx索引如下所示:

source schools : root
{
    sql_query = \
        SELECT schools.id, schools.name, schools.state, schools.country, schools.city, \
        (select COUNT(*) from user2school WHERE school_id = schools.id) as user_count \
        FROM schools

    sql_attr_uint       = user_count
}

index schools
{
    source                  = schools
    path                    = /var/db/sphinx/data/schools
    min_infix_len           = 3
    infix_fields            = name
}

生成结果的代码如下:

$sphinx->SetMatchMode(SPH_MATCH_EXTENDED);
$sphinx->SetRankingMode(SPH_RANK_WORDCOUNT);
$sphinx->SetSortMode(SPH_SORT_RELEVANCE);

$sphinx->SetFieldWeights(array(
    'id' => 0,
    'name' => 1000,
    'city' => 0,
    'state' => 0,
    'user_count' => 0
));

如何让Sphinx识别自定义重量?我试过的每一个组合似乎都失败了。


编辑:

这是另一个具有相同排序但是设置完全不同的示例。我在这里打开的唯一选择是:

$sphinx->SetRankingMode(SPH_RANK_SPH04);

结果:

"albany high"

Albany Junior High School | Auckland, NZ | 3 (still shouldn't be first)
Albany High School        | Albany, NY   | 3
South Albany High School  | Albany, OR   | 2
Albany High School        | Albany, CA   | 1 (still shouldn't be last)

如您所见,顺序是相同的。在我尝试的排名,排序和权重的每个组合中都是相同的。有什么我可以尝试调试这个问题吗?

2 个答案:

答案 0 :(得分:1)

可能是您的应用程序中存在逻辑错误。 Sphinx为您提供了一个ID列表,您可以使用这些ID从原始数据库中检索数据。也许你没有正确排序这些行。

我只是尝试将您的数据插入测试RT索引(包括字符串属性,因此可以查看数据)

mysql> insert into rttest values (1,'Albany Junior High School','Auckland','NZ','Albany Junior High School, Auckland, NZ');
   ... etc ...

mysql> select * from rttest where match('albany high');
+------+--------+-----------------------------------------+
| id   | weight | value                                   |
+------+--------+-----------------------------------------+
|    2 |   3267 | Albany High School, Albany, NY          |
|    3 |   3267 | South Albany High School, Albany, OR    |
|    4 |   3267 | Albany High School, Albany, CA          |
|    1 |   1304 | Albany Junior High School, Auckland, NZ |
+------+--------+-----------------------------------------+
4 rows in set (0.15 sec)

mysql> select * from rttest where match('albany high') option ranker=sph04;
+------+--------+-----------------------------------------+
| id   | weight | value                                   |
+------+--------+-----------------------------------------+
|    2 |  12267 | Albany High School, Albany, NY          |
|    4 |  12267 | Albany High School, Albany, CA          |
|    3 |  10267 | South Albany High School, Albany, OR    |
|    1 |   6304 | Albany Junior High School, Auckland, NZ |
+------+--------+-----------------------------------------+
4 rows in set (0.00 sec)

mysql> select * from rttest where match('albany high') option ranker=wordcount;
+------+--------+-----------------------------------------+
| id   | weight | value                                   |
+------+--------+-----------------------------------------+
|    2 |      3 | Albany High School, Albany, NY          |
|    3 |      3 | South Albany High School, Albany, OR    |
|    4 |      3 | Albany High School, Albany, CA          |
|    1 |      2 | Albany Junior High School, Auckland, NZ |
+------+--------+-----------------------------------------+
4 rows in set (0.00 sec)

更改排名模式确实有效。

答案 1 :(得分:0)

你的SetFieldWeights中的0看起来很奇怪。要么只记下要设置权重的字段,要么使用1作为默认值。我怀疑0会引起问题。

怀疑SPH_RANK_SPH04最适合这种特殊情况。

也不需要你的setSelect