Question

我有一段时间试图在单词的开头或中间处理撇号字符。我能够处理占有欲的英语，但我也试图迎合法语并处理像“动作”这样的词，其中撇号字符出现在单词的开头而不是像“她的”那样。

通过haystack auto_query搜索“d action”将返回结果，但“d'action”不返回任何内容。如果我直接查询elasticsearch _search API（_search？q = D％27ACTION），我会得到“d'action”的结果。因此，我想知道这是否是干草堆引擎问题。

我的配置：

'settings': {
    "analysis": {
        "char_filter": {
            "quotes": {
                "type": "mapping",
                "mappings": [
                    "\\u0091=>\\u0027",
                    "\\u0092=>\\u0027",
                    "\\u2018=>\\u0027",
                    "\\u2019=>\\u0027",
                    "\\u201B=>\\u0027"
                ]
            }
        },
        "analyzer": {
            "ch_analyzer": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": ['ch_en_possessive_word_delimiter', 'ch_fr_stemmer'],
                "char_filter": ['html_strip', 'quotes'],
            },
        },

        "filter": {
            "ch_fr_stemmer" : {
                "type": "snowball",
                "language": "French"
            },
            "ch_en_possessive_word_delimiter": {
                "type": "word_delimiter",
                "stem_english_possessive": True
            }
        }
    }
}

我还有ElasticsearchSearchBackend和BaseEngine的子类，所以我可以添加上面的配置：

class ConfigurableESBackend(ElasticsearchSearchBackend):
    # Word reserved by Elasticsearch for special use.
    RESERVED_WORDS = (
        'AND',
        'NOT',
        'OR',
        'TO',
    )

    # Characters reserved by Elasticsearch for special use.
    # The '\\' must come first, so as not to overwrite the other slash replacements.
    RESERVED_CHARACTERS = (
        '\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}',
        '[', ']', '^', '"', '~', '*', '?', ':',
    )

    def setup(self):
        """
        Defers loading until needed.
        """
        # Get the existing mapping & cache it. We'll compare it
        # during the ``update`` & if it doesn't match, we'll put the new
        # mapping.
        try:
            self.existing_mapping = self.conn.get_mapping(index=self.index_name)
        except Exception:
            if not self.silently_fail:
                raise

        unified_index = haystack.connections[self.connection_alias].get_unified_index()
        self.content_field_name, field_mapping = self.build_schema(unified_index.all_searchfields())
        current_mapping = {
            'modelresult': {
                'properties': field_mapping,
                '_boost': {
                    'name': 'boost',
                    'null_value': 1.0
                }
            }
        }

        if current_mapping != self.existing_mapping:
            try:
                # Make sure the index is there first.
                self.conn.create_index(self.index_name, settings.ELASTICSEARCH_INDEX_SETTINGS)
                self.conn.put_mapping(self.index_name, 'modelresult', mapping=current_mapping)
                self.existing_mapping = current_mapping
            except Exception:
                if not self.silently_fail:
                    raise

        self.setup_complete = True

class CHElasticsearchSearchEngine(BaseEngine):
    backend = ConfigurableESBackend
    query = ElasticsearchSearchQuery

Answer 1

好的，这与配置无关，而是用于干草堆索引的.txt模板的问题。

我有：

{{ object.some_model.name_en }}
{{ object.some_model.name_fr }}

导致人物喜欢＆＃39;要转换为html权限（'），这导致搜索永远不会找到结果。使用＆＃34; safe＆＃34;解决了这个问题：

{{ object.some_model.name_en|safe }}
{{ object.some_model.name_fr|safe }}

如何配置Haystack / Elasticsearch来处理单词开头附近的收缩和撇号

1 个答案: