Question

我的数据库中填充了这样的文档：

{
  _index: "bla_bla",
  .
  .
  .
  _source: {
    domain: "somedomain.extension",
    path: "/you/know/the/path",
    lang: "en",
    keywords: ["yeah", "you", "rock", "dude", "help", "me", "good", "samaritan"]
  }
}

当我搜索时 - 无论我想要什么 - 它都像魅力一样，但是，如果我尝试通过使用名为路径的字段来过滤某些东西 - 它只是 - 不要＆＃ 39;工作;不会抛出任何错误或警告。在经过深思熟虑的研究之后我猜它是因为路径开头的斜线，我可能是对还是不对，但无论如何我需要像这样过滤：

{
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "should": {
                        "terms": {
                            "keywords": ["stackoverflow", "rocks", "!"]
                        }
                    },
                    "must_not": {
                        "term": {
                            "path": "/"
                            // This works, i.e -> "lang": "en"
                        }
                    }
                }       
            }
        }
    },
    "from": 0,
    "size": 9
}

TL; DR ：拥有网址的数据库，如何只获取非root [路径长于＆＃34; /＆＃34;]一些？

Answer 1

在ElasticSearch中，文本被拆分为许多字符，包括斜杠。您需要做的是使用“not_analyzed”索引。这是一个工作示例，请注意“路径”字段中的索引规范：

PUT /index1/test/_mapping
{
    "test" : {
        "properties" : {
            "message" : {"type" : "string"},
            "path" : {"type" : "string", "index" : "not_analyzed"}
        }
    }
}

POST index1/test
{
  "path" : "/foo/bar"  
}

GET index1/test/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "path": "/foo/bar"
        }
      }
    }
  }
}

Answer 2

免责声明：我不是关于ES的专家，但如果正确理解你想要的是排除所有只有/的文件。到底。如果您的字符串包含1个字符，那么您始终将数据存储为/path，那么为什么不使用正则表达式？

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-filter.html

我认为这样的事情可以解决问题：

Elasticsearch在开始时不能使用斜杠

2 个答案: