如何让Sphinx实时编入索引?

时间:2012-07-24 19:04:56

标签: mysql ruby-on-rails full-text-search sphinx thinking-sphinx

我有一个Rails 3.2.6应用,我正在使用Sphinx 0.9.9Thinking Sphinx 2.0.12

我需要Sphinx实时更新其索引。例如,当用户创建新帖子时,它会立即显示在搜索中。或者如果他们删除了一个帖子,它将不会显示,从他们删除它的那一刻开始。

我遵循了有关delta indexing的文档。

根据这个建议,我有一个每20分钟执行一次的cron作业并运行bundle exec rake ts:index RAILS_ENV=production ......

  

启用增量索引并不会消除定期运行完整重新索引的需要,否则增量索引本身将变得与核心索引一样大,这就消除了将其保持独立的优势。它还会减慢您对服务器的请求,从而更改模型记录。

只有在该作业运行后才会显示新条目。

这是我的define_index ...

define_index do

  indexes(title)
  indexes(entry)

  has user_id
  has created_at
  has updated_at

  set_property :delta => true

end

这是我的production.sphinx.conf ...

indexer
{
}

searchd
{
  listen = 127.0.0.1:9312
  log = /opt/deployed_rails_apps/my_app/releases/20120713022228/log/searchd.log
  query_log = /opt/deployed_rails_apps/my_app/releases/20120713022228/log/searchd.query.log
  pid_file = /opt/deployed_rails_apps/my_app/releases/20120713022228/log/searchd.production.pid
}

source entry_core_0
{
  type = mysql
  sql_host = localhost
  sql_user = abc
  sql_pass = abc
  sql_db = my_app_production
  sql_query_pre = UPDATE `entries` SET `delta` = 0 WHERE `delta` = 1
  sql_query_pre = SET NAMES utf8
  sql_query_pre = SET TIME_ZONE = '+0:00'
  sql_query = SELECT SQL_NO_CACHE `entries`.`id` * CAST(1 AS SIGNED) + 0 AS `id` , `entries`.`title` AS `title`, `entries`.`entry` AS `entry`, `entries`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3940594292 AS `class_crc`, `entries`.`user_id` AS `user_id`, UNIX_TIMESTAMP(`entries`.`created_at`) AS `created_at`, UNIX_TIMESTAMP(`entries`.`updated_at`) AS `updated_at` FROM `entries`  WHERE (`entries`.`id` >= $start AND `entries`.`id` <= $end AND `entries`.`delta` = 0) GROUP BY `entries`.`id` ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM `entries` WHERE `entries`.`delta` = 0
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = sphinx_deleted
  sql_attr_uint = class_crc
  sql_attr_uint = user_id
  sql_attr_timestamp = created_at
  sql_attr_timestamp = updated_at
  sql_query_info = SELECT * FROM `entries` WHERE `id` = (($id - 0) / 1)
}

index entry_core
{
  source = entry_core_0
  path = /opt/deployed_rails_apps/my_app/releases/20120713022228/db/sphinx/production/entry_core
  charset_type = utf-8
}

source entry_delta_0 : entry_core_0
{
  type = mysql
  sql_user = abc
  sql_pass = abc
  sql_db = my_app_production
  sql_query_pre = 
  sql_query_pre = SET NAMES utf8
  sql_query_pre = SET TIME_ZONE = '+0:00'
  sql_query = SELECT SQL_NO_CACHE `entries`.`id` * CAST(1 AS SIGNED) + 0 AS `id` , `entries`.`title` AS `title`, `entries`.`entry` AS `entry`, `entries`.`id` AS `sphinx_internal_id`, 0 AS `sphinx_deleted`, 3940594292 AS `class_crc`, `entries`.`user_id` AS `user_id`, UNIX_TIMESTAMP(`entries`.`created_at`) AS `created_at`, UNIX_TIMESTAMP(`entries`.`updated_at`) AS `updated_at` FROM `entries`  WHERE (`entries`.`id` >= $start AND `entries`.`id` <= $end AND `entries`.`delta` = 1) GROUP BY `entries`.`id` ORDER BY NULL
  sql_query_range = SELECT IFNULL(MIN(`id`), 1), IFNULL(MAX(`id`), 1) FROM `entries` WHERE `entries`.`delta` = 1
  sql_attr_uint = sphinx_internal_id
  sql_attr_uint = sphinx_deleted
  sql_attr_uint = class_crc
  sql_attr_uint = user_id
  sql_attr_timestamp = created_at
  sql_attr_timestamp = updated_at
  sql_query_info = SELECT * FROM `entries` WHERE `id` = (($id - 0) / 1)
}

index entry_delta : entry_core
{
  source = entry_delta_0
  path = /opt/deployed_rails_apps/my_app/releases/20120713022228/db/sphinx/production/entry_delta
}

index entry
{
  type = distributed
  local = entry_delta
  local = entry_core
}

任何想法我可能做错了什么?

1 个答案:

答案 0 :(得分:0)

我知道这已经过时了,但您应该考虑更新您的Sphinx版本并转而使用RT模型而不是主+ delta方案。

Link to RT indexes - Sphinx Documentation