ElasticSearch在查询响应中返回multi_field字段值

时间:2014-01-17 14:58:32

标签: elasticsearch

我的问题是如何在进行查询时返回sub-field multi_field的令牌。我似乎只能获得multi_field本身的值,而不是分析的令牌值。

我在我的网址字段上设置multi_field以分割出文件扩展名(如果有的话)。这将创建以下映射:

{
  "url": {
    "type": "multi_field",
    "fields": {
      "ext": {
        "type": "string",
        "analyzer": "url_ext_analyzer",
        "include_in_all": false
      },
      "untouched": {
        "type": "string",
        "index": "not_analyzed",
        "omit_norms": true,
        "index_options": "docs",
        "include_in_all": false
      }
    }
  }
}

在我的测试查询中,我试图通过执行以下操作在响应中返回url.ext字段值:

{
  "query": {
    "match_all": {}
  },
  "filter": {
    "term": {
      "url.ext": "pdf"
    }
  },
  "fields": [
    "_id",
    "_type",
    "url",
    "title",
    "url.ext"
  ]
}

但它没有出现在回复中。 (我要求的其他字段显示在字段数组中):

{
  "hits": [
    {
      "_index": "test2",
      "_type": "doc",
      "_id": "1",
      "_score": 1,
      "fields": {
        "url": "http://bacon.com/static/764612436137067/cms/documents/bacon-ipsum.pdf",
        "title": "Bacon ipsum"
      }
    }
  ]
}

用于创建示例的bash脚本:

curl -XDELETE localhost:9200/test2?pretty
curl -XPOST localhost:9200/test2?pretty -d '{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
      },
      "char_filter": {
        "myFileExtRegex": {
          "type": "pattern_replace",
          "pattern": "(.*)\\.([a-z]{3,5})$",
          "replacement": "$2"
        }
      },
      "analyzer": {
        "url_ext_analyzer": {
          "type": "custom",
          "char_filter": [
            "myFileExtRegex"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  }
}'

curl -XPUT localhost:9200/test2/doc/_mapping?pretty -d '{
  "tweet": {
    "index_analyzer": "standard",
    "search_analyzer": "standard",
    "date_formats": [
      "yyyy-MM-dd",
      "dd-MM-yyyy"
    ],
    "properties": {
      "title": {
        "type": "string",
        "analyzer": "standard"
      },
      "content": {
        "type": "string",
        "analyzer": "standard"
      },
      "url": {
        "type": "multi_field",
        "fields": {
          "untouched": {
            "type": "string",
            "index": "not_analyzed"
          },
          "ext": {
            "type": "string",
            "analyzer": "url_ext_analyzer",
            "stored": "yes"
          }
        }
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/test2/doc/1?pretty' -d '{
  "content": "Bacon ipsum dolor sit amet ham drumstick jowl ham hock capicola meatball shankle pork filet mignon ground round jerky turkey prosciutto",
  "title": "Bacon ipsum",
  "url": "http://bacon.com/static/764612436137067/cms/documents/bacon-ipsum.pdf"
}'

curl -XGET localhost:9200/test2/_mapping?pretty

1 个答案:

答案 0 :(得分:2)

在地图中,您应该"store" : "yes"而不是"stored": "yes"。简单的错字。

我并不肯定你的正则表达式正在按预期工作,但是通过修正拼写错误来解决在搜索请求中返回字段的问题。您会注意到"url""url.ext"字段都返回相同的内容,听起来可能不是您想要的内容,但我不确定。

这是一个可运行的例子。我在"store" : "yes"个子字段中添加了url,并在搜索请求中添加了几个方面,以便您可以查看url子字段的标记内容。我还在映射中将"tweet"更改为"doc",这似乎就是您的意思。

http://sense.qbox.io/gist/60c448df41827146e93daf0a93591f001d46e42f