我正在使用思考sphinx gem我的查询需要大约45秒才能完成(1300万条记录,包含索引的文件夹是1.1GB)。我假设我配置错误(第一次使用Sphinx用户)。无论如何,如果你看到任何看起来不对劲的东西,请告诉我。这是我的配置:
define_index do
indexes :name
indexes :summary
indexes :tag_list
indexes categories.name, :as => :category_name
has "RADIANS(lat)", :as => :latitude, :type => :float
has "RADIANS(lng)", :as => :longitude, :type => :float
set_property :field_weights => {
:name => 8,
:summary => 6,
:category_name => 5,
:tag_list => 3
}
set_property :delta => ThinkingSphinx::Deltas::ResqueDelta
set_property :ignore_chars => %w(' -)
end
以下是一个示例查询:
Location.search('Restaurant',
:geo => [0.5837843098436726,-1.9560609568879357],
:latitude_attr => "latitude",
:longitude_attr => "longitude",
:with => {"@geodist" => 0.0..4000.0},
:include => :categories,
:page => 1,
:per_page => 100)
我的日志显示:
Sphinx Query (43066.3ms) restaurant
Sphinx Found 467 results
我会继续深入研究文档并尝试一些东西!
更新:我的development.sphinx.conf
indexer
{
}
searchd
{
listen = 127.0.0.1:9312
log = /project_path/log/searchd.log
query_log = /project_path/log/searchd.query.log
pid_file = /project_path/log/searchd.development.pid
}
source location_core_0
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = db_name
sql_query_pre = UPDATE "business_entities" SET "delta" = FALSE WHERE "delta" = TRUE
sql_query_pre = SET TIME ZONE 'UTC'
sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = FALSE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = FALSE
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_float = latitude
sql_attr_float = longitude
sql_attr_string = sphinx_internal_class
sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}
index location_core
{
source = location_core_0
path = /project_path/db/sphinx/development/location_core
morphology = stem_en
charset_type = utf-8
ignore_chars = ', -
enable_star = 1
}
source location_delta_0 : location_core_0
{
type = pgsql
sql_host = localhost
sql_user = user
sql_pass = pass
sql_db = db_name
sql_query_pre =
sql_query_pre = SET TIME ZONE 'UTC'
sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = TRUE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type"
sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = TRUE
sql_attr_uint = sphinx_internal_id
sql_attr_uint = sphinx_deleted
sql_attr_uint = class_crc
sql_attr_float = latitude
sql_attr_float = longitude
sql_attr_string = sphinx_internal_class
sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0) / 1)
}
index location_delta : location_core
{
source = location_delta_0
path = /project_path/db/sphinx/development/location_delta
}
index location
{
type = distributed
local = location_delta
local = location_core
}
答案 0 :(得分:0)
我不确切地知道为什么它的搜索速度如此之慢,但我首先要简化查询中的内容,然后逐点添加复杂性,以查看是否有任何特定原因。所以,首先:
Location.search('Restaurant')
然后也许:
Location.search('Restaurant', :per_page => 100)
等等。不要忘记索引定义中的:field_weights
也会产生影响。
所有这一切,我并没有发现任何与你正在做的事情有什么特别奇怪的事情,43秒的搜索(或任何接近的事情)是我之前没有遇到的事情。
答案 1 :(得分:0)
我发现了我的问题 - 记录恰好在STI表中,但我只想索引类型为Location的位置(Location没有任何后代)。在该表中的1300万条记录中,99.99984%(严重)是位置类型。 SELECT DISTINCT类型FROM business_entities查询占用时间过长(即使使用索引)。棘手的部分是注意到这一点,因为日志报告持续84秒的Sphinx查询,但它确实是掠夺性SQL查询的问题:
SQL (43647.1ms) SELECT DISTINCT type FROM business_entities
SQL (39857.7ms) SELECT DISTINCT type FROM business_entities
Sphinx Query (84173.0ms) restaurant
所以我在初始化器中修补了Thinking Sphinx以返回我唯一关心的类型:
module ThinkingSphinx
class Source
module SQL
def type_values
['Location']
end
end
end
end