使用custom_score按嵌套子项的时间戳排序

时间:2013-04-05 21:21:29

标签: elasticsearch

我对弹性搜索很新,并且一直试图让这种排序工作。一般的想法是使用嵌套消息和嵌套参与者搜索电子邮件消息线程。目标是在线程级别显示搜索结果,由正在进行搜索的参与者进行排序,并根据他们所在的邮箱对last_received_at或last_sent_at列进行排序。

我的理解是,你不能按照许多嵌套孩子中的单个孩子的价值进行排序。因此,为了做到这一点,我看到了一些关于将custom_score与脚本一起使用,然后对分数进行排序的建议。我的计划是动态更改排序列,然后运行嵌套的custom_score查询,该查询将返回其中一个参与者的日期作为分数。我一直注意到一些问题,得分格式都很奇怪(例如,最后总是有4个零),而且可能没有返回我期待的日期。

以下是索引的简化版本和相关查询。如果有人有任何建议,我将非常感激。 (仅供参考 - 我使用的是elasticsearch版本0.20.6。)

指数:

mappings: {
    message_thread: {
        properties: {
            id: {
                type: long
            }
            subject: {
                dynamic: true
                properties: {
                    id: {
                        type: long
                    }
                    name: {
                        type: string
                    }
                }
            }
            participants: {
                dynamic: true
                properties: {
                    id: {
                        type: long
                    }
                    name: {
                        type: string
                    }
                    last_sent_at: {
                        format: dateOptionalTime
                        type: date
                    }
                    last_received_at: {
                        format: dateOptionalTime
                        type: date
                    }
                }
            }
            messages: {
                dynamic: true
                properties: {
                    sender: {
                        dynamic: true
                        properties: {
                            id: {
                                type: long
                            }
                        }
                    }
                    id: {
                        type: long
                    }
                    body: {
                        type: string
                    }
                    created_at: {
                        format: dateOptionalTime
                        type: date
                    }
                    recipient: {
                        dynamic: true
                        properties: {
                            id: {
                                type: long
                            }
                        }
                    }
                }
            }
            version: {
                type: long
            }
        }
    }
}

查询:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": { "participants.id": 3785 }
        },
        {
          "custom_score": {
            "query": {
              "filtered": {
                "query": { "match_all": {} },
                "filter": {
                  "term": { "participants.id": 3785 }
                }
              }
            },
            "params": { "sort_column": "participants.last_received_at" },
            "script": "doc[sort_column].value"
          }
        }
      ]
    }
  },
  "filter": {
    "bool": {
      "must": [
        {
          "term": { "messages.recipient.id": 3785 }
        }
      ]
    }
  },
  "sort": [ "_score" ]
}

解决方案:

感谢@imotov,这是最终结果。参与者没有正确嵌套在索引中(而消息不需要)。此外,include_in_root用于参与者简化查询(参与者是小记录而不是真正的大小问题,尽管@imotov也提供了一个没有它的例子)。然后他重新构建了JSON请求以使用dis_max查询。

curl -XDELETE "localhost:9200/test-idx"
curl -XPUT "localhost:9200/test-idx" -d '{
  "mappings": {
    "message_thread": {
      "properties": {
        "id": {
          "type": "long"
        },
        "messages": {
          "properties": {
            "body": {
              "type": "string",
              "analyzer": "standard"
            },
            "created_at": {
              "type": "date",
              "format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
            },
            "id": {
              "type": "long"
            },
            "recipient": {
              "dynamic": "true",
              "properties": {
                "id": {
                  "type": "long"
                }
              }
            },
            "sender": {
              "dynamic": "true",
              "properties": {
                "id": {
                  "type": "long"
                }
              }
            }
          }
        },
        "messages_count": {
          "type": "long"
        },
        "participants": {
          "type": "nested",
          "include_in_root": true,
          "properties": {
            "id": {
              "type": "long"
            },
            "last_received_at": {
              "type": "date",
              "format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
            },
            "last_sent_at": {
              "type": "date",
              "format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
            },
            "name": {
              "type": "string",
              "analyzer": "standard"
            }
          }
        },
        "subject": {
          "properties": {
            "id": {
              "type": "long"
            },
            "name": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}'
curl -XPUT "localhost:9200/test-idx/message_thread/1" -d '{
  "id" : 1,
  "subject" : {"name": "Test Thread"},
  "participants" : [
    {"id" : 87793, "name" : "John Smith", "last_received_at" : null, "last_sent_at" : "2010-10-27T17:26:58Z"},
    {"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-27T17:26:58Z", "last_sent_at" : null}
  ],
  "messages" : [{
    "id" : 1,
    "body" : "This is a test.",
    "sender" : { "id" : 87793 },
    "recipient" : { "id" : 3785},
    "created_at" : "2010-10-27T17:26:58Z"
  }]
}'
curl -XPUT "localhost:9200/test-idx/message_thread/2" -d '{
  "id" : 2,
  "subject" : {"name": "Elastic"},
  "participants" : [
    {"id" : 57834, "name" : "Paul Johnson", "last_received_at" : "2010-11-25T17:26:58Z", "last_sent_at" : "2010-10-25T17:26:58Z"},
    {"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-25T17:26:58Z", "last_sent_at" : "2010-11-25T17:26:58Z"}
  ],
  "messages" : [{
    "id" : 2,
    "body" : "More testing of elasticsearch.",
    "sender" : { "id" : 57834 },
    "recipient" : { "id" : 3785},
    "created_at" : "2010-10-25T17:26:58Z"
  },{
    "id" : 3,
    "body" : "Reply message.",
    "sender" : { "id" : 3785 },
    "recipient" : { "id" : 57834},
    "created_at" : "2010-11-25T17:26:58Z"
  }]
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
# Using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
  "query": {
    "filtered": {
      "query": {
        "nested": {
          "path": "participants",
          "score_mode": "max",
          "query": {
            "custom_score": {
              "query": {
                "filtered": {
                  "query": {
                    "match_all": {}
                  },
                  "filter": {
                    "term": {
                      "participants.id": 3785
                    }
                  }
                }
              },
              "params": {
                "sort_column": "participants.last_received_at"
              },
              "script": "doc[sort_column].value"
            }
          }
        }
      },
      "filter": {
        "query": {
          "multi_match": {
            "query": "test",
            "fields": ["subject.name", "participants.name", "messages.body"],
            "operator": "and",
            "use_dis_max": true
          }
        }
      }
    }
  },
  "sort": ["_score"],
  "fields": []
}
'

# Not using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
  "query": {
    "filtered": {
      "query": {
        "nested": {
          "path": "participants",
          "score_mode": "max",
          "query": {
            "custom_score": {
              "query": {
                "filtered": {
                  "query": {
                    "match_all": {}
                  },
                  "filter": {
                    "term": {
                      "participants.id": 3785
                    }
                  }
                }
              },
              "params": {
                "sort_column": "participants.last_received_at"
              },
              "script": "doc[sort_column].value"
            }
          }
        }
      },
      "filter": {
        "query": {
          "bool": {
            "should": [{
              "match": {
                "subject.name":"test"
              }
            }, {
              "nested" : {
                "path": "participants",
                "query": {
                  "match": {
                    "name":"test"
                  }
                }
              }
            }, {
              "match": {
                "messages.body":"test"
              }
            }
            ]
          }
        }
      }
    }
  },
  "sort": ["_score"],
  "fields": []
}
'

1 个答案:

答案 0 :(得分:0)

这里有几个问题。您询问嵌套对象,但参与者未在映射中定义为嵌套对象。第二个可能的问题是得分具有float类型,因此它可能没有足够的精度来表示时间戳。如果您可以弄清楚如何将此值放入float中,您可以查看此示例:Elastic search - tagging strength (nested/child document boosting)。但是,如果您正在开发新系统,则可能需要谨慎升级到0.90.0.Beta1,它支持在嵌套字段上进行排序。