忽略Elasticsearch上的ascii字符

时间:2015-02-18 13:27:15

标签: php elasticsearch

我如何忽略Elasticsearch上的ascii字符? 我读过http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html但现在不知道如何执行。

我正在使用PHP包。

public function createIndex()
{
    $indexParams['index'] = $this->data['index'];

    $mapping = [
        '_source'    => [
            'enabled' => true
        ],
        'properties' => [
            'history.name'  => [
                'type'  => 'string',
                '_boost' => 0.2
            ]
        ]
    ];
    $settings = [
        "analysis" => [
            "analyzer" => [
                "default" => [
                    "tokenizer" => "standard",
                    "filter" => ["standard", "asciifolding"]
                ]
            ]
        ]
    ];
    $indexParams['body']['mappings'][$this->data['type']] = $mapping;
    $indexParams['body']['settings'][$this->data['type']] = $settings;

    $this->es->client->indices()->create($indexParams);
}

但这仍然不会忽略重音字符。

谢谢,

2 个答案:

答案 0 :(得分:1)

很少有更正和建议:

  • 我更喜欢设置显式分析器,而不是更改默认值。未来会有更少的惊喜。因此,在您的示例中,我明确设置了analyzer: ascii_folding
  • 然后我将分析仪名称从default更改为ascii_folding
  • 最后,设置是每个索引,而不是每个类型。 JSON结构是:

    {
      "settings" : {
        "analysis" : {}
      },
      "mappings" : {
        "my_type" : {}
      }
    }
    

编辑:用经过测试和运行的代码替换旧示例。硬编码一些值(索引,类型等),但其他方面相同。它将文档作为命中返回...您的查询必定存在其他错误。

$indexParams['index'] = 'test';
$mapping = [
    '_source'    => [
        'enabled' => true
    ],
    'properties' => [
        'history.name'  => [
            'type'  => 'string',
            '_boost' => 0.2,
            'analyzer' => 'ascii_folding'
        ]
    ]
];
$settings = [
    "analysis" => [
        "analyzer" => [
            "ascii_folding" => [
                "tokenizer" => "standard",
                "filter" => ["standard", "asciifolding"]
            ]
        ]
    ]
];
$indexParams['body']['mappings']['test'] = $mapping;
$indexParams['body']['settings'] = $settings;

// create index and wait for yellow
$client->indices()->create($indexParams);
$client->cluster()->health(['wait_for_status' => 'yellow']);


//Index your document, refresh to make it visible
$params = [
    'index' => 'test',
    'type' => 'test',
    'id' => 1,
    'body' => [
        'history.name' => 'Nicôlàs Wîdàrt'
    ]
];
$client->index($params);
$client->indices()->refresh();

// Now search for it
$params = [
    'index' => 'test',
    'type' => 'test',
    'body' => [
        'query' => [
            'match' => [
                'history.name' => 'Nicolas'
            ]
        ]
    ]
];
$results = $client->search($params);
print_r($results);

将单个doc作为值返回:

Array
(
    [took] => 3
    [timed_out] => 
    [_shards] => Array
        (
            [total] => 5
            [successful] => 5
            [failed] => 0
        )
    [hits] => Array
        (
            [total] => 1
            [max_score] => 0.19178301
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => test
                            [_type] => test
                            [_id] => 1
                            [_score] => 0.19178301
                            [_source] => Array
                                (
                                    [history.name] => Nicôlàs Wîdàrt
                                )
                        )
                )
        )
)

答案 1 :(得分:0)

我想知道你的PHP脚本是否正确(我不是PHP开发人员)。我可能写道:

$indexParams['body']['settings'][$this->data['index']] = $settings;