减少Sphinx搜索时间的想法

时间:2012-01-11 23:21:29

标签: ruby-on-rails ruby-on-rails-3 sphinx thinking-sphinx

我正在使用思考sphinx gem我的查询需要大约45秒才能完成(1300万条记录,包含索引的文件夹是1.1GB)。我假设我配置错误(第一次使用Sphinx用户)。无论如何,如果你看到任何看起来不对劲的东西,请告诉我。这是我的配置:

define_index do
  indexes :name
  indexes :summary
  indexes :tag_list

  indexes categories.name, :as => :category_name

  has "RADIANS(lat)",  :as => :latitude,  :type => :float
  has "RADIANS(lng)",  :as => :longitude,  :type => :float

  set_property :field_weights => {
    :name           => 8,
    :summary        => 6,
    :category_name  => 5,
    :tag_list       => 3
  }
  set_property :delta => ThinkingSphinx::Deltas::ResqueDelta
  set_property :ignore_chars => %w(' -)
end

以下是一个示例查询:

Location.search('Restaurant',
                :geo => [0.5837843098436726,-1.9560609568879357],
                :latitude_attr => "latitude",
                :longitude_attr => "longitude",
                :with => {"@geodist" => 0.0..4000.0},
                :include => :categories,
                :page => 1,
                :per_page => 100)

我的日志显示:

Sphinx Query (43066.3ms)  restaurant
Sphinx  Found 467 results

我会继续深入研究文档并尝试一些东西!

更新:我的development.sphinx.conf

indexer
{
}

searchd
{
    listen = 127.0.0.1:9312
    log = /project_path/log/searchd.log
    query_log = /project_path/log/searchd.query.log
    pid_file = /project_path/log/searchd.development.pid
}

source location_core_0
{
    type = pgsql
    sql_host = localhost
    sql_user = user
    sql_pass = pass
    sql_db = db_name
    sql_query_pre = UPDATE "business_entities" SET "delta" = FALSE WHERE "delta" = TRUE
    sql_query_pre = SET TIME ZONE 'UTC'
    sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = FALSE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
    sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = FALSE
    sql_attr_uint = sphinx_internal_id
    sql_attr_uint = sphinx_deleted
    sql_attr_uint = class_crc
    sql_attr_float = latitude
    sql_attr_float = longitude
    sql_attr_string = sphinx_internal_class
    sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}

index location_core
{
    source = location_core_0
    path = /project_path/db/sphinx/development/location_core
    morphology = stem_en
    charset_type = utf-8
    ignore_chars = ', -
    enable_star = 1
}

source location_delta_0 : location_core_0
{
    type = pgsql
    sql_host = localhost
    sql_user = user
    sql_pass = pass
    sql_db = db_name
    sql_query_pre = 
    sql_query_pre = SET TIME ZONE 'UTC'
    sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = TRUE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
    sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = TRUE
    sql_attr_uint = sphinx_internal_id
    sql_attr_uint = sphinx_deleted
    sql_attr_uint = class_crc
    sql_attr_float = latitude
    sql_attr_float = longitude
    sql_attr_string = sphinx_internal_class
    sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}

index location_delta : location_core
{
    source = location_delta_0
    path = /project_path/db/sphinx/development/location_delta
}

index location
{
    type = distributed
    local = location_delta
    local = location_core
}

2 个答案:

答案 0 :(得分:0)

我不确切地知道为什么它的搜索速度如此之慢,但我首先要简化查询中的内容,然后逐点添加复杂性,以查看是否有任何特定原因。所以,首先:

Location.search('Restaurant')

然后也许:

Location.search('Restaurant', :per_page => 100)

等等。不要忘记索引定义中的:field_weights也会产生影响。

所有这一切,我并没有发现任何与你正在做的事情有什么特别奇怪的事情,43秒的搜索(或任何接近的事情)是我之前没有遇到的事情。

答案 1 :(得分:0)

我发现了我的问题 - 记录恰好在STI表中,但我只想索引类型为Location的位置(Location没有任何后代)。在该表中的1300万条记录中,99.99984%(严重)是位置类型。 SELECT DISTINCT类型FROM business_entities查询占用时间过长(即使使用索引)。棘手的部分是注意到这一点,因为日志报告持续84秒的Sphinx查询,但它确实是掠夺性SQL查询的问题:

SQL (43647.1ms)  SELECT DISTINCT type FROM business_entities
SQL (39857.7ms)  SELECT DISTINCT type FROM business_entities

Sphinx Query (84173.0ms)  restaurant

所以我在初始化器中修补了Thinking Sphinx以返回我唯一关心的类型:

module ThinkingSphinx
  class Source
    module SQL
      def type_values
        ['Location']
      end
    end
  end
end

https://gist.github.com/1603565