Elasticsearch过滤逻辑

时间:2014-02-03 21:34:48

标签: elasticsearch

按类别过滤时无法找到结果。删除类别过滤器有效。

经过多次实验,这是我的疑问:

"query": {
    "filtered": {
        "query": {
            "multi_match": {
                "query": "*",
                "zero_terms_query": "all",
                "operator": "and",
                "fields": [
                    "individual_name.name^1.3",
                    "organisation_name.name^1.8",
                    "profile",
                    "accreditations"
                ]
            }
        },
        "filter": {
            "bool": {
                "must": [{
                    "term": { "categories" : "9" }
                ]}
            }
        }
    }
}

这是一些示例数据:

{
_index: providers
_type: provider
_id: 3
_version: 1
_score: 1
_source: {
    locations: 
    id: 3
    profile: <p>Dr Murray is a (blah blah)</p>
    cost_id: 3
    ages: null
    nationwide: no
    accreditations: null
    service_types: null
    individual_name: Dr Linley Murray
    organisation_name: Crawford Medical Centre
    languages: {"26":26}
    regions: {"1":"Auckland"}
    districts: {"8":"Manukau City"}
    towns: {"2":"Howick"}
    categories: {"10":10}
    sub_categories: {"47":47}
    funding_regions: {"7":7}
}
}

这些是我的索引设置:

$index_settings = array(
    'number_of_shards' => 5,
    'number_of_replicas' => 1,
    'analysis' => array(
        'char_filter' => array(
            'wise_mapping' => array(
                'type'     => 'mapping',
                'mappings' => array('\'=>', '.=>', ',=>')
            )
        ),
        'filter' => array(
            'wise_ngram'   => array(
                'type'     => 'edgeNGram',
                'min_gram' => 5,
                'max_gram' => 10
            )
        ),
        'analyzer' => array(
            'index_analyzer'  => array(
                'type'        => 'custom',
                'tokenizer'   => 'standard',
                'char_filter' => array('html_strip', 'wise_mapping'),
                'filter'      => array('standard', 'wise_ngram')
            ),
            'search_analyzer'  => array(
                'type'        => 'custom',
                'tokenizer'   => 'standard',
                'char_filter' => array('html_strip', 'wise_mapping'),
                'filter'      => array('standard', 'wise_ngram')
            ),
        )
    )
);

有更好的方法来过滤/搜索此内容吗?当我使用雪球代替nGram时,过滤器工作。这是为什么?

1 个答案:

答案 0 :(得分:2)

您正在查询category字段,查找字词9,但category字段实际上是一个对象:

{ "category": { "10": 10 }}

所以你的过滤器应该是这样的:

{ "term": { "category.9": 9 }}

为什么要用这种方式指定类别?你最终会得到一个你不想要的每个类别的新领域。

查询部分还有另一个问题。您正在使用multi_match查询多个字段,并将operator设置为and。查询“棕色狐狸”:

{ "multi_match": {
    "query": "brown fox",
    "fields": [ "foo", "bar"]
}}

将被重写为:

{ "dis_max": {
    "queries": [
        { "match": { "foo": { "query": "brown fox", "operator": "and" }}},
        { "match": { "bar": { "query": "brown fox", "operator": "and" }}}
    ]
}}

换句话说:所有字词必须在同一字段中 ,而不是在任何列出的字段中!这显然不是你想要的。

这是一个非常难以解决的问题。事实上,在v1.1.0中,我们将adding new functionality multi_match查询{{1}},这将极大地帮助解决这种情况。

您可以阅读new functionality on this page