我有以下映射:
{
"articles":{
"mappings":{
"article":{
"_all":{
"enabled":false
},
"_source":{
"enabled":false
},
"properties":{
"content":{
"type":"string",
"norms":{
"enabled":false
}
},
"url":{
"type":"string",
"index":"not_analyzed"
}
}
}
},
"settings":{
"index":{
"refresh_interval":"30s",
"number_of_shards":"20",
"analysis":{
"analyzer":{
"default":{
"filter":[
"icu_folding",
"icu_normalizer"
],
"type":"custom",
"tokenizer":"icu_tokenizer"
}
}
},
"number_of_replicas":"1"
}
}
}
}
问题是可以以某种方式提取url
字段的实际值,因为它not_analyzed
以及何时未启用_source
?我只需要为这个索引执行一次这样的操作,所以即使是一种hacky方式也是可以接受的。
我知道not_analyzed
意味着字符串不会被标记化,所以我觉得它应该存储在某个地方,但我不知道它是哈希还是1:1而我在文档中找不到相关信息。
我的服务器正在运行带有JVM的ES版1.4.4
:1.8.0_31
答案 0 :(得分:1)
您可以阅读字段数据以从文档中检索网址。我们将直接从ES索引中阅读,因此我们将得到我们的确切内容"匹配"在这种情况下,在您未编制索引的索引的确切URL。
使用您提供的示例索引,我索引了两个URL(在您提供的索引的较小子集上:
POST /articles/article/1
{
"url":"https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fielddata-fields.html"
}
POST /articles/article/2
{
"url":"http://stackoverflow.com/questions/37488389/can-i-extract-the-actual-value-of-not-analyzed-field-when-source-is-disabled"
}
然后这个查询将为我提供一个新的"字段"每次击中的对象:
GET /articles/article/_search
{
"fielddata_fields" : ["url"]
}
给我们这些结果:
"hits": [
{
"_index": "articles",
"_type": "article",
"_id": "2",
"_score": 1,
"fields": {
"url": [
"http://stackoverflow.com/questions/37488389/can-i-extract-the-actual-value-of-not-analyzed-field-when-source-is-disabled"
]
}
},
{
"_index": "articles",
"_type": "article",
"_id": "1",
"_score": 1,
"fields": {
"url": [
"https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fielddata-fields.html"
]
}
}
]
希望有所帮助!