狮身人面像三角洲指数忽略主要指数

时间:2011-09-24 17:15:49

标签: php sphinx

我有一个非常奇怪的问题,由于某些原因,我的索引根本无法正常运行。

我已经在Sphinx中构建了一个完全可用的delta索引,并且完整的cron作业可以保持整体状态,一切都很好。

然后我来用PHP查询:

class sphinx_searcher{

function __construct(){

    $config = array('host'=>'localhost', 'port'=>9312);

    $this->sphinx = new SphinxClient();
    $this->sphinx->SetServer ( $config['host'], $config['port'] );
    $this->sphinx->SetConnectTimeout ( 1 );
}

function query(){

    $this->sphinx->SetSortMode(SPH_SORT_RELEVANCE);
    $this->sphinx->SetLimits(0, 20); // Testing first page
    $this->sphinx->SetRankingMode(SPH_RANK_PROXIMITY_BM25);
    $this->sphinx->SetArrayResult ( true );
    $res = $this->sphinx->Query("040*", "media media_delta");

    if($res)
        return $res;
    else
        return $this->sphinx->GetLastError();

}
}

由于某种原因,它需要一个或另一个索引(到目前为止只有后者)。

当我单独通过媒体查询时,我得到了文档ID 1和2但是当我通过两者查询时,我只获得了在delta索引中的doc id 3。

这是我的数据源配置:

source media
{
type            = mysql
sql_query_pre       = SET NAMES utf8
sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(id) FROM documents
sql_query = \
    SELECT id, deleted, _id, uid, listing, title, description, tags, author_name, playlist, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
    FROM documents \
    WHERE id<=( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )

sql_field_string = tags
sql_field_string = description
sql_field_string = author_name
sql_field_string = title
sql_attr_uint = deleted
sql_attr_string = _id
sql_attr_string = uid
sql_attr_string = listing
sql_attr_uint = playlist
sql_attr_timestamp = date_uploaded
sql_ranged_throttle = 0
sql_query_info = SELECT * FROM media WHERE id=$id
sql_query_killlist = SELECT id FROM documents WHERE deleted = 0

}


source media_delta : media
{
sql_query_pre = SET NAMES utf8
sql_query = \
    SELECT id, deleted,  _id, uid, listing, title, description, tags, author_name, playlist, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
    FROM documents \
    WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}

这是我的索引配置:

index media
{
source          = media
path            = /home/sam/sphinx/var/data/media
docinfo         = extern
mlock           = 0
morphology      = stem_en, stem_ru, soundex
min_word_len        = 1
charset_type        = sbcs
min_infix_len       = 2
infix_fields        = title, tags 
enable_star     = 1
expand_keywords     = 1
html_strip      = 0
index_exact_words   = 1
}

index media_delta : media
{
source = media_delta
path = /home/sam/sphinx/var/data/media_delta
}

我真的很困惑我的错误,我希望有人可以帮我找出问题所在?

编辑:

不使用所有索引:

array(9) { ["error"]=> string(0) "" ["warning"]=> string(0) "" ["status"]=> int(0) ["fields"]=> array(4) { [0]=> string(5) "title" [1]=> string(11) "description" [2]=> string(4) "tags" [3]=> string(11) "author_name" } ["attrs"]=> array(10) { ["deleted"]=> int(1) ["_id"]=> int(7) ["uid"]=> int(7) ["listing"]=> int(7) ["title"]=> int(7) ["description"]=> int(7) ["tags"]=> int(7) ["author_name"]=> int(7) ["playlist"]=> int(1) ["date_uploaded"]=> int(2) } ["total"]=> string(1) "0" ["total_found"]=> string(1) "0" ["time"]=> string(5) "0.000" ["words"]=> array(1) { ["040*"]=> array(2) { ["docs"]=> string(1) "2" ["hits"]=> string(1) "2" } } } 

谢谢,

1 个答案:

答案 0 :(得分:3)

在完成一些可能性之后,发现了问题,

sql_query_killlist = SELECT id FROM documents WHERE deleted = 0

这表示任何带有“deleted = 0”的文件都会消失。即将被“杀死”。

我认为在这种情况下,混淆“命中”仍然在“单词”数组中计算。尽管后来被杀了。 (单词array是任何过滤之前的原始数字 - 它直接来自索引 - 所以任何setFilter(或者在这种情况下是kill-list)都会使它高估)

所以将其改为

WHERE deleted = 1

:)

始终是最意想不到的事情!