Elasticsearch搜索电话号码

时间:2016-02-16 03:02:35

标签: postgresql ruby-on-rails-4 elasticsearch elasticsearch-rails

我有postgres数组列,我希望将其编入索引,然后在搜索中使用它。以下是示例,

手机= [“+ 175(2)123-25-32”,“123456789”,“+ 12 111-111-11”]

我使用analyze api分析了令牌,elasticsearch将字段标记为多个字段,如下所示

curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "standard",
  "text" : [ "+175 (2) 123-25-32", "123456789", "+12 111-111-11" ]
}'


{
  "tokens": [
    {
      "token": "analyzer",
      "start_offset": 6,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "standard",
      "start_offset": 19,
      "end_offset": 27,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "text",
      "start_offset": 33,
      "end_offset": 37,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "175",
      "start_offset": 45,
      "end_offset": 48,
      "type": "<NUM>",
      "position": 4
    },
    {
      "token": "2",
      "start_offset": 50,
      "end_offset": 51,
      "type": "<NUM>",
      "position": 5
    },
    {
      "token": "123",
      "start_offset": 53,
      "end_offset": 56,
      "type": "<NUM>",
      "position": 6
    },
    {
      "token": "25",
      "start_offset": 57,
      "end_offset": 59,
      "type": "<NUM>",
      "position": 7
    },
    {
      "token": "32",
      "start_offset": 60,
      "end_offset": 62,
      "type": "<NUM>",
      "position": 8
    },
    {
      "token": "123456789",
      "start_offset": 66,
      "end_offset": 75,
      "type": "<NUM>",
      "position": 9
    },
    {
      "token": "12",
      "start_offset": 80,
      "end_offset": 82,
      "type": "<NUM>",
      "position": 10
    },
    {
      "token": "111",
      "start_offset": 83,
      "end_offset": 86,
      "type": "<NUM>",
      "position": 11
    },
    {
      "token": "111",
      "start_offset": 87,
      "end_offset": 90,
      "type": "<NUM>",
      "position": 12
    },
    {
      "token": "11",
      "start_offset": 91,
      "end_offset": 93,
      "type": "<NUM>",
      "position": 13
    }
  ]
}

我想要弹性搜索要么不进行标记化并存储没有特殊字符的数字,例如“+175(2)123-25-32”要转换为“+17521232532”或简单地将数字索引为原样,以便它将在搜索结果中提供。

我的映射如下,

{ :id => { :type => "string"}, :secondary_phones => { :type => "string" } }

以下是我尝试查询的方法

      settings = {
        query: {
          filtered: {
            filter: {
              bool: {
                should: [
                  { terms: { phones: [ "+175 (2) 123-25-32", "123456789", "+12 111-111-11" ] } },
                ]
              }
            }
          }
        },
        size: 100,
      }

P.S我也尝试删除特殊字符,但没有运气。

我确信这是可以实现的,我错过了一些东西。建议请。

感谢。

1 个答案:

答案 0 :(得分:0)

如果您只想对数据执行完全匹配,就像在terms查询示例中一样,最好的方法是将映射中的index映射参数设置为{{1} }。看看documentation here

这将完全禁用值的分析(或标记化),并将字段的内容(数组中的每个项目)视为单个标记/关键字。