如何提高弹性搜索的相关性?

时间:2015-07-30 11:56:47

标签: php elasticsearch

这是我的映射看起来

$arr = [
        'index' => 'test1',
        'body' => [
            'settings' => [
               'analysis' => [
                    'analyzer' => [
                        'name_analyzer' => [
                            'type' => 'custom',
                            'tokenizer' => 'standard',
                            'filter' => [
                              'lowercase',
                              'asciifolding',
                              'word_delimiter'
                            ]
                        ]
                    ]
                ]
            ],
            "mappings" => [
                "info" => [
                    "properties" => [
                        "Name" => [// this field is analyzed
                            "type" => "string",
                            "fields" => [
                                "raw" => [ //subfield of Name is not analyzed so that we can avoid a known issue of space saperated bucket generation
                                    "type" => "string",
                                    "index" => "not_analyzed"
                                ]
                            ]
                        ],
                        "Address" => [
                            "type" => "string",
                            "index" => "analyzed",
                            "analyzer" => "name_analyzer"
                        ]

                    ]
                ]
            ]
        ]
    ];

这是我的查询

$query['index'] = 'test1';
    $query['type']  = 'info';
    //without bool & should also it will  work
    $query['body'] = [
        'query'=> [
            'bool' => [
                'should' => [
                    'query_string' => [
                        'fields' => ['Name'],
                        'query' => 'sa*',
                        'analyze_wildcard' => 'true'
                    ]
                ]
            ]
        ],
        'size'=> '0',
        'aggregations' => [
            'actor' => [
                'terms' => [
                    'field' => 'Name.raw',
                    'size' => 10
                ]
            ]
        ]
    ];

我的输出是

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "actor": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Salma Hayak",
          "doc_count": 1
        },
        {
          "key": "Salman Khan",
          "doc_count": 1
        },
        {
          "key": "Salman Shaikh",
          "doc_count": 1
        }
      ]
    }
  }
}

我想要的是因为Salman Khan是搜索次数最多的演员,与Salma Hayak比较,他说当用户搜索“sa”时他们应该首先看到salman khan而不是salma hayak。

有人可以帮我吗?

0 个答案:

没有答案