我们正在使用ElasticSearch获取15M记录。记录按不同的索引大小分割,其中一些索引有150万条记录。
我们有足够的内存80 GB,整个60 GB的索引适合RAM。作为ElasticSearch的响应时间,我们有统计数据,查询执行时间为7毫秒,但我们从300毫秒的ElasticSearch获得结果。这有什么不对?我们在哪里可以搜索,我们的时间在哪里?
ES设置:
2 Nodes on 2 different hosts
Each index has 1 primary shard we have 2 shards each index
3,762 Total Shards
3,762 Successful Shards
85 Indices
20,347,989 Documents
40.5GB Size
elasticsearch.yml
index.cache.field.type: soft
indices.cache.filter.size: 50%
index.fielddata.cache: soft
index.cache.field.expire: 60m
indices.fielddata.cache.size: 50%
indices.fielddata.cache.expire : 60m
index.store.type: mmapfs
transport.tcp.compress: true;
bootstrap.mlockall: true
index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms
示例:我们有一个国家DE索引,有1,5M文档。该索引有2个分片。
ES的开始:
/usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms32g -Xmx32g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-1.1.2.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch
OS:
24 Cores
80 GB of RAM
60 GB are used
Disk space: 1,2 TB
350 GB used / 780GB free
Disc type: SAS
Mysql is running also on this machine
示例查询:搜索某个城市,我们向ES提供location_id:
{
"query": {
"match_all": {}
},
"sort": {},
"facets": {
"location_id": {
"facet_filter": {
"bool": {
"must": [{
"terms": {
"sponsored": [
1,
0
]
}
}, {
"geo_distance": {
"distance": "50km",
"geo_point": {
"lat": -33.42628,
"lon": -70.56656
}
}
}]
}
},
"terms": {
"field": "location_facet",
"all_terms": true,
"size": 100,
"script": "doc['geo_point'].empty ? null : ceil(doc['geo_point'].arcDistanceInKm(-33.42628, -70.56656)) + '|' + doc['location_facet'].value\n + '|' + doc['location_id'].value"
}
},
"company_id": {
"facet_filter": {
"bool": {
"must": [{
"terms": {
"sponsored": [
1,
0
]
}
}, {
"geo_distance": {
"distance": "50km",
"geo_point": {
"lat": -33.42628,
"lon": -70.56656
}
}
}, {
"terms": {
"location_id": [
25717
]
}
}]
}
},
"terms": {
"field": "company_facet",
"order": "count",
"script": "doc['company_facet'].value + '|' + doc['company_id'].value"
}
},
"job_type_id": {
"facet_filter": {
"bool": {
"must": [{
"terms": {
"sponsored": [
1,
0
]
}
}, {
"geo_distance": {
"distance": "50km",
"geo_point": {
"lat": -33.42628,
"lon": -70.56656
}
}
}]
}
},
"terms": {
"field": "jobtype_facet",
"order": "term",
"all_terms": true
}
}
},
"filter": {},
"size": 10,
"from": 0,
"explain": false,
"highlight": {
"order": "score",
"require_field_match": false,
"pre_tags": [
"<b>"
],
"post_tags": [
"</b>"
],
"fields": {
"description": {
"type": "fvh",
"force_source": true,
"no_match_size": 200,
"index_options": "offsets",
"fragment_size": 200,
"number_of_fragments": 2,
"matched_fields": [
"description",
"title"
]
}
}
}
}
此查询的响应时间:&gt; 400ms,非常慢。我们也禁用了面孔,但没有任何改变。
答案 0 :(得分:0)
对于单点a&#34; geo_bounding_box&#34;过滤器可能比&#34; geo_distance&#34;。
更快