Question

我以下列方式在ES中保存了价值TB的数据：

"class" : 
{
 "type": "nested",
 "properties": {
   "name": {"type": "string"},
   "methods": [ {
     "name": {"type": "string"}
   } ]
}

简单地说，我将数据保存为（class1，[method1，method2，...]），（class2，[method3，method4，...]）......

我在ES文档中看到，所有数据都在lucene键值对中减少，不确定这是否与此相关。

如果按如下方式排列数据，是否会减少搜索延迟？ {class1，method1}，{class1，method2}，.... {class2，method3} ....

示例查询：搜索给定的类名和方法名对，并显示索引中具有该对的所有文档。

感谢任何帮助。请建议，如果有更好的方法来处理它。

Answer 1

在您的两个选项之间（即每个类一个嵌套文档与每个类和方法对一个嵌套文档），搜索时间不应有明显差异。就个人而言，我更喜欢第一个选项，因为它似乎是一个更好的数据模型。此外，它意味着总共需要的文档更少。（请记住，ES中的“嵌套”文档实际上只是Lucene中的另一个真实文档.ES简单地管理将嵌套文档直接放在父文档旁边以进行有效的关系管理）

在内部，ES将每个值视为一个数组，因此它当然适合处理第一个选项。假设示例映射如下：

PUT /my_index/
{
  "mappings": {
    "my_type": {
      "properties": {
        "someField": { "type": "string" },
        "classes": {
          "type": "nested", 
          "properties": {
            "class": { "type":"string", "index":"not_analyzed" },
            "method": { "type": "string", "index":"not_analyzed" }
          }
        }
      }
    }
  }
}

然后您可以输入文件，例如：

POST test_index/my_type
{
  "someField":"A",
  "classes": {
    "class":"Java.lang.class1",
    "method":["myMethod1","myMethod2"]
  }
}

POST test_index/my_type
{
  "someField":"B",
  "classes": {
    "class":"Java.lang.class2",
    "method":["myMethod3","myMethod4"]
  }
}

为了满足您的示例查询，您只需在bool查询中使用nested过滤器即可。例如：

GET test_index/my_type/_search
{
  "query": {
    "nested": {
      "path": "classes",
      "query": {
        "bool": {
          "filter": [
            { "term": {"classes.class":"Java.lang.class2"} },
            { "term": {"classes.method":"myMethod3"} }
          ]
        }
      }
    }
  }
}

这将从我的示例中返回第二个文档。

有效处理字符串和字符串数组

1 个答案: