Question

我在Elasticsearch中索引了2.000.000个文档（使用R中的Elastic库），我想知道特定字段中最常用的术语，比方说，该字段称为'X'，其中包含字符串。但是，聚合函数将引发错误：错误：400-所有分片均失败

我在R中尝试了以下操作（示例由弹性库手册调整）。

步骤1

我首先创建了具有映射的索引（即，在原始索引中，“ X”字段被索引为“关键字”字段而不是文本），我认为可能是问题所在。

    body <- list(test = list(properties = list(
         X = list(type="text"),
         Y = list(type="long")
         )))
    if (!index_exists("example")) index_create("example")
    mapping_create(index = "example", type = "test", body=body)

步骤2

我接下来索引了一堆文件

    X <- c("xxx first","xxx second","xxx third","yyy fourth")
    Y <- c("21","22","24","17")
    data <- data.frame(X,Y)

    docs_bulk(x=data,index='example',type = "test")

步骤3

接下来，我创建了聚合查询并在r中执行了

    body <-   
  '{
   "size": 0,
   "aggs": {
   "frequent_tags": {
   "terms": {"field": "X"}
   }
   }
   }
   '

    Search(index='example',body=body)

步骤4

...，我收到错误消息：“错误：400-所有分片均失败”

步骤5和6

接下来，我添加了“属性”。到正文（即{“ field”：“ attribute.X”}），现在执行查询，但没有任何结果。我也尝试过{“ field”：“ keyword.X”}），但这也没有得到预期的结果。

预期结果

一个说

的对象

xxx --> 3 documents
yyy --> 1 document
first --> 1 document
second --> 1 document
fourth --> 1 document

感谢您的帮助；让我知道您是否需要更多信息。

Answer 1

elastic的维护者：试图在Elasticsearch方面解决问题时，第一件事就是做connect(errors = "complete")-当有一个时，它将在R控制台中抛出完整的Elasticsearch堆栈跟踪。那应该让您确切地知道查询中的问题在哪里。

我按照上面的示例，设置了connect(errors = "complete")，我得到了：

Search(index='example',body=body)
Error: 400 - all shards failed
ES stack trace:

  type: illegal_argument_exception
  reason: Fielddata is disabled on text fields by default. Set fielddata=true on
    [X] in order to load fielddata in memory by uninverting the inverted index. 
    Note that this can however use significant memory. Alternatively use a keyword
    field instead.

与

elastic::ping()$version$number
[1] "6.6.1"

在索引中查找最常用的术语（错误：400-所有分片均失败）

步骤1

步骤2

步骤3

步骤4

步骤5和6

预期结果

1 个答案: