计算新字段并在加入2个弹性搜索索引后聚合

时间:2014-12-19 12:14:30

标签: join elasticsearch aggregation

我发布了另一个关于2个弹性搜索指数 - Join elasticsearch indices while matching fields in nested/inner objects之间相关性的问题。我现在正试图扩展它。以下是我根据我的帖子给出的答案创建的代码。

数据创建:

curl -XPUT http://localhost:9200/currencylookup/inr/1 -d '{
"conv":[
{
"currency":"usd",
"currencyname": "US Dollar",
"units_per_inr":"0.016155969",
"inr_per_unit": "61.89662756" 
},
{
"currency":"inr",
"currencyname": "Indian Rupee",
"units_per_inr":"1",
"inr_per_unit": "1" 
 },

{
"currency":"idr",
"currencyname": "Indonesian Rupiah",
"units_per_inr":"199.2576913",
"inr_per_unit": "0.005018627" 
}
]
}'

curl -XPUT "http://localhost:9200/expenses/overseas/1" -d '{ "amount":"100", "currency":"usd", "location":"USA" }'

curl -XPUT "http://localhost:9200/expenses/overseas/2" -d '{ "amount":"50", "currency":"JPY", "location":"JAPAN" }'

curl -XPUT "http://localhost:9200/expenses/overseas/3" -d '{ "amount":"50", "currency":"inr", "location":"INDIA" }'

curl -XPUT "http://localhost:9200/expenses/overseas/4" -d '{ "amount":"30", "currency" : "IDR", "location": "Indonesia"}'

curl -XPUT "http://localhost:9200/expenses/overseas/5" -d '{ "amount":"89", "currency":"USD", "location":"USA" }'

查询:

curl -XPOST http://localhost:9200/expenses/overseas/_search?pretty -d '{
   "query" : {
 "filtered" : {
   "filter" : {
     "terms" : {
       "currency" : {
        "index" : "currencylookup",
         "type" : "inr",
         "id" : "1",
         "path" : "conv.currency"
       },
       "_cache_key" : "currencyexchange"
     }
   }
 }
   }
 }'

我得到了结果 - 来自费用指数的4条记录,不包括JPY,而且在currencylookup中没有。

但我最终需要做的是以单一货币获取所有费用数据,这意味着我必须以其他方式进行查询,这就是出现问题的时候。

 curl -XPOST http://localhost:9200/currencylookup/inr/_search?pretty -d '{
   "query" : {
 "filtered" : {
   "filter" : {
     "terms" : {
         "conv.currency" : {
         "index" : "expenses",
         "type" : "overseas",
         "id" : "2",
         "path" : "currency"
       },
       "_cache_key" : "currencyexchange6"
     }
   }
 }
   }
 }'

查看conv.currency似乎不起作用。我无法为其指定路径。我试图使currencylookup成为一个扁平结构,但这也行不通。我不想把我的费用作为嵌套/内部对象数组。

因此,考虑到费用索引中的费用ID,我如何在currencylookup索引中查找适当的货币汇率并计算具有目标货币金额的新字段。例如:对于费用ID 1,我必须在currencylookup中查找“usd”,获取字段inr_per_unit并计算expenseAmountInINR。

如果我走得那么远,我想根据一些参数汇总转换后的费用金额。是否可以这样做?

1 个答案:

答案 0 :(得分:0)

这可能不是解决这个问题的正确方法,我所做的就是完全黑客攻击,但你要求的是scripted metric aggregation。这种聚合在v1.4.x中是新的,并且仍然是实验性的(因此在生产中使用它时要小心)。

我略微修改了您的currencylookup索引,为每个转化因素创建了一个文档:

curl -XDELETE "http://localhost:9200/currencylookup"

curl -XPUT "http://localhost:9200/currencylookup/inr/usd" -d'
{
   "currency": "usd",
   "currencyname": "US Dollar",
   "units_per_inr": 0.016155969,
   "inr_per_unit": 61.89662756
}'
curl -XPUT "http://localhost:9200/currencylookup/inr/inr" -d'
{
   "currency": "inr",
   "currencyname": "Indian Rupee",
   "units_per_inr": 1,
   "inr_per_unit": 1
}'
curl -XPUT "http://localhost:9200/currencylookup/inr/idr" -d'
{
   "currency": "idr",
   "currencyname": "Indonesian Rupiah",
   "units_per_inr": 199.2576913,
   "inr_per_unit": 0.005018627
}'

并设置expenses索引,就像你拥有它一样:

curl -XDELETE "http://localhost:9200/expenses"

curl -XPUT "http://localhost:9200/expenses/overseas/1" -d'
{ "amount":100, "currency":"usd", "location":"USA" }'
curl -XPUT "http://localhost:9200/expenses/overseas/2" -d'
{ "amount":50, "currency":"JPY", "location":"JAPAN" }'
curl -XPUT "http://localhost:9200/expenses/overseas/3" -d'
{ "amount":50, "currency":"inr", "location":"INDIA" }'
curl -XPUT "http://localhost:9200/expenses/overseas/4" -d'
{ "amount":30, "currency" : "IDR", "location": "Indonesia"}'
curl -XPUT "http://localhost:9200/expenses/overseas/5" -d'
{ "amount":89, "currency":"USD", "location":"USA" }'

然后我使用脚本化的度量聚合一起查询两个索引,如下所示:

curl -XPOST "http://localhost:9200/expenses,currencylookup/_search?search_type=count" -d'
{
    "aggs": {
        "results": {
            "scripted_metric": {
                "init_script" : "_agg[\"exp\"] = []; _agg[\"cur\"] = []",
                "map_script" : "if (doc[\"_type\"].value == \"inr\") { _agg.cur.add([doc[\"currency\"].value, doc[\"inr_per_unit\"].value]) } else { _agg.exp.add([doc[\"currency\"].value, doc[\"amount\"].value]) }",
                "reduce_script" : "exp=[]; cur=[]; for (item in _aggs) { exp += item.exp; cur += item.cur }; results=[]; for (c in cur) { for (e in exp) { if (e[0] == c[0]) { results.add(e[1]*c[1]) } } }; return results;"
            }
        }
    }
}'

产生转换值:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 10,
      "successful": 10,
      "failed": 0
   },
   "hits": {
      "total": 8,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "results": {
         "value": [
            5508.79985284,
            6189.662756,
            50,
            0.15055881000000002
         ]
      }
   }
}

要将结果限制为usd中的结果,我们可以使用filtered query

curl -XPOST "http://localhost:9200/expenses,currencylookup/_search?search_type=count" -d'
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "term": {
               "currency": "usd"
            }
         }
      }
   },
   "aggs": {
      "converted_expenses": {
         "scripted_metric": {
            "init_script": "_agg[\"exp\"] = []; _agg[\"cur\"] = []",
            "map_script": "if (doc[\"_type\"].value == \"inr\") { _agg.cur.add([doc[\"currency\"].value, doc[\"inr_per_unit\"].value]) } else { _agg.exp.add([doc[\"currency\"].value, doc[\"amount\"].value]) }",
            "reduce_script": "exp=[]; cur=[]; for (item in _aggs) { exp += item.exp; cur += item.cur }; results=[]; for (c in cur) { for (e in exp) { if (e[0] == c[0]) { results.add(e[1]*c[1]) } } }; return results;"
         }
      }
   }
}'

得到以下特性:

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 10,
      "successful": 10,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "converted_expenses": {
         "value": [
            5508.79985284,
            6189.662756
         ]
      }
   }
}

我怀疑这种方法是否会很好地扩展。就像我说的那样,它可能不是解决问题的最佳方法。如果是我,我可能会找到一种方法来在应用程序代码而不是在Elasticsearch中进行转换。但是你去了。

以下是我在解决此问题时使用的代码:

http://sense.qbox.io/gist/6e7c8467ad7732c296448cec86e5c25e3c3c7326

(要在浏览器中使用此代码,您必须在弹性搜索实例中设置http.cors.enabled: true;默认情况下,在v1.4.2中禁用跨源访问)

编辑:据我所知,似乎没有办法使用这种技术进一步聚合结果。