如果我必须删除某些关键字,然后在索引分析期间删除字符串中的所有空格,请使用:
'analysis' => array(
'filter' => array(
'whitespace_remove' => array(
'type' => 'pattern_replace',
'pattern' => ' ',
'replacement' => ''
),
'my_stop' => array(
'type' => 'stop',
'stopwords' => array('bad', 'horrible', 'useless')
),
'edge' => array(
'type' => 'edge_ngram',
'min_gram' => '1',
'max_gram' => '5'
)
),
和分析器
'keyword_space_ngram' => array(
'type' => 'custom',
'tokenizer' => 'keyword',
'filter' => array(
'lowercase',
'my_stop',
'whitespace_remove',
'edge'
)
)
我如何确保按此顺序应用过滤器,即转换为小写,删除关键字,删除空格然后执行ngram分析?
答案 0 :(得分:0)
您可以在索引时使用自定义 char_filter
删除停用词和white_spaces:
{
"analysis": {
"char_filter": {
"whitespace_remove": {
"type": "pattern_replace",
"pattern": "\\s+",
"replacement": ""
},
"custom_stop_words_char_filter": {
"type": "mapping",
"mappings": [
"bad => ",
"horrible => ",
"useless => "
]
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase", "asciifolding"],
"char_filter": ["custom_stop_words_char_filter", "whitespace_remove"]
}
}
}
}
例如,这会将 bad angry man
转换为 angryman
要添加 edge_ngram
过滤器,只需在 edge
数组的末尾添加 filter
注意:您的停用词只有在小写
时才会被替换