Question

我有一个html文件，我需要查找完全匹配字符串的部分，说“年度报告保留”。我使用的是Elasticsearch 5.4.0的最新版本。我是elasticsearch的新手。对于索引，我已经将分析器定义如下：

{
    "index_name": {
        "settings": {
            "index": {
                "number_of_shards": "5",
                "provided_name": "index_name",
                "creation_date": "1496927173220",
                "analysis": {
                    "analyzer": {
                        "contact_section_analyzer": {
                            "tokenizer": "my_tokenizer"
                        }
                    },
                    "tokenizer": {
                        "my_tokenizer": {
                            "pattern": "(ANNUAL REPORT PURSUANT)",
                            "type": "pattern",
                            "group": "1"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "vF3cAe-STJW-GrVxc7N8ww",
                "version": {
                    "created": "5040099"
                }
            }
        }
    }
}

现在我正尝试使用analyze进行偏移，如下所示：

POST localhost:9200/sag_sec_items6/_analyze?pretty
{
  "analyzer": "contact_section_analyzer", 
  "text": "my_html_file_contents_already_indexed"
}

它返回：

{
    "tokens": []
}

我查看了包含该文字的html文件。

将_search查询与单个_ids一起使用，我得到了整个html文件。如何获得包含该文本的偏移量或html标记。

Answer 1

我重新定义了我的分析仪设置，如下所示：

set

在正则表达式模式中进行此更改并在模式分析器中包含CASE_INSENSITIVE | DOTALL标志，我可以获得偏移量。

Elasticsearch：需要精确匹配字符串的偏移量

1 个答案: