如何避免在jq中记录重复

时间:2018-02-15 10:13:30

标签: jq

我有以下Json:

{
"hits": {
    "hits": [
        {
            "_source": {
                "offers_data": [
                    {
                        "base_price": 198.89, 
                        "shop_id": 2002, 
                        "shop_name": "TheOtherShop"
                    }, 
                    {
                        "base_price": 223, 
                        "shop_id": 2247, 
                        "shop_name": "MainShop"
                    }, 
                    {
                        "base_price": 225, 
                        "shop_id": 2247, 
                        "shop_name": "MainShop"
                    }
                ], 
                "search_result_data": {
                    "identifiers": {
                        "id": 32116
                    }, 
                    "shop": {
                        "id": 2247, 
                        "name": "MainShop" 
                    } 
                }
            }
        }
    ]
}
}

我正在编写以下命令:

jq -c --raw-output '.hits.hits[]|{products_ids: ._source.search_result_data.identifiers.id,
best_shop_id: ._source.search_result_data.shop.id,
best_shop_name: (if ._source.search_result_data.shop.id>0 then ._source.search_result_data.shop.id as $shop_id|._source.offers_data[]|select(.shop_id==$shop_id).shop_name else "" end),
best_offer_base_price: (if ._source.search_result_data.shop.id>0 then ._source.search_result_data.shop.id as $shop_id|._source.offers_data[]|select(.shop_id==$shop_id).base_price else "" end)}'

我得到了这个结果:

{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":223}
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":225}
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":223}
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":225}

正如你所看到我得到2个重复:当然我有两个来自MainShop的优惠,所以我得到2个记录是正常的,但如果我也取得基本价格,它会再次复制结果。在我的真实案例中,我得到32条记录而不是2条合法的记录,因为我正在取其他字段。因此,每当我获取字段时,我都希望避免这种额外的重复。

锦上添花将只能获得一条记录,其中Mainshop提供的base_price是最低记录。

由于

1 个答案:

答案 0 :(得分:0)

结冰

  

... base_price最小的那个。

以下两个对问题的解释都假设我们可以采用具有最小价值的任何可接受项目作为“最小”项目。

原始问题的第一个解释

.hits.hits[]._source
| (.offers_data | min_by(.base_price)) as $min_offers_data
| .search_result_data
| {products_ids: .identifiers.id}
  + ($min_offers_data
    | {best_shop_id: .shop_id,
       best_shop_name: .shop_name,
       best_offer_base_price: .base_price})

输出:

{
  "products_ids": 32116,
  "best_shop_id": 2002,
  "best_shop_name": "TheOtherShop",
  "best_offer_base_price": 198.89
}

第二种解释

将注意力限制在.search_result_data.shop.id:

.hits.hits[]._source
| (.search_result_data.shop.id) as $shop
| (.offers_data | map(select(.shop_id == $shop)) | min_by(.base_price)) as $min_offers_data
| .search_result_data
| {products_ids: .identifiers.id}
  + ($min_offers_data
     | {best_shop_id: .shop_id,
        best_shop_name: .shop_name,
        best_offer_base_price: .base_price})

输出

{
  "products_ids": 32116,
  "best_shop_id": 2247,
  "best_shop_name": "MainShop",
  "best_offer_base_price": 223
}