Question

我正在研究一个用例，其中我想确定弹性搜索中的哪个文档在任何字段值（索引和非索引）中具有特定关键字。在 SQL 的情况下，这将很简单，虽然它会有点沉重。就像 Postgres 一样，我们可以这样做：

select count(1) from table where data::text like '%keyword%'

但是，在弹性搜索的情况下，我正在试验无痛脚本。这是我正在尝试的：

使用脚本过滤器
添加无痛脚本以将整个文档 JSON 转换为字符串
并且，在脚本中，如果文档字符串包含关键字，则返回 true

现在，我在 scripted_field 子句中尝试了以下不同的变体，以将整个文档转换为字符串：

 - doc.toString()
 - doc['_source'].value
 - params._source.toString()
 - ctx._source.value.toString()

而且，没有任何效果。

但是，我可以看到检索单个字段值是有效的。所以，下面的查询给出了下面提到的结果：

"script_fields": {
    "myfield": {
      "script": """
            return params._source.id.toString();
      """
    }
}

结果：

{
  "_index": "myindex",
  "_type": "mytype",
  "_id": "1557e321-b6be-491f-a869-9309194af658",
  "_score": 13.808922,
  "myfield": {
    "text": [
      "1557e321-b6be-491f-a869-9309194af658"
    ]
  }
}

任何人都可以帮助在 ES 中在运行时将整个文档 json 转换为字符串吗？

Answer 1

经过大量试验并参考了几篇博客后，我确定了一种解决方法，包括迭代文档的每个字段并递归构建文档字符串。

这是最终的工作版本：

{
  "query": {
    "bool": {
      "must": [
        <some clause here>
      ],
      "filter": {
        "script": {
          "script": """
              void processFields(StringBuilder sb, def x) {
               if( x != null) {
                 if (x instanceof List) {
                   for (def v: x) {
                     processFields(sb, v);
                   }
                 }
                 
                 if ((x instanceof HashMap)) {
                   HashMap map = (HashMap) x;
                   for (def v: map.values()) {
                     processFields(sb, v);
                   }
                 }
                 sb.append(x);
                 sb.append('~|~');
               }
             }
            
            StringBuilder sb = new StringBuilder();
            for (key in params._source.keySet()) {
              processFields(sb, params._source.get(key));
            }
            String s = sb.toString();
            return s.contains('<keyword-to-search-in-entire-document-string>');
"""
        }
      }
    }
  }
}

注意：确保 processFields() 中使用的分隔符与您要搜索的分隔符不同。 :)

Ref 递归迭代所有文档字段： https://alexmarquardt.com/category/painless/

检查整个弹性搜索文档中的特定文本作为字符串

1 个答案: