匹配多个阵列的不同计数

时间:2017-07-13 00:08:28

标签: mongodb mongodb-query aggregation-framework

现在我有一个集合col,其中包含以下文档:

{
    "_id": 1,
    "shares": [{
            "fundcode": "000001",     
            "lastshares": 1230.20,
            "agencyno": "260",
            "netno": "260"
        },{
            "fundcode": "000002",
            "lastshares": 213124.00,
            "agencyno": "469",
            "netno": "001"
        },{
            "fundcode": "000003",
            "lastshares": 10000.80,
            "agencyno": "469",
            "netno": "002"
        }
    ],
    "trade": [{
            "fundcode": "000001",
            "c_date": "20160412",
            "agencyno": "260",
            "netno": "260",
            "bk_tradetype": "122",
            "confirmbalance": 1230.20,
            "cserialno": "10110000119601",
            "status": "1"
        },{
            "fundcode": "000002",
            "c_date": "20160506",
            "agencyno": "469",
            "netno": "001",
            "bk_tradetype": "122",
            "confirmbalance": 213124.00,
            "cserialno": "10110000119602",
            "status": "1"
        },{
            "fundcode": "000003",
            "c_date": "20170507",
            "agencyno": "469",
            "netno": "002",
            "bk_tradetype": "122",
            "confirmbalance": 10000.80,
            "netvalue": 1.0000,
            "cserialno": "10110000119602",
            "status": "1"
        }
    ]
}

如何使用mongodb查询实现类似以下sql的选择?:

SELECT _id
FROM col 
WHERE col.shares.lastshares > 1000 
  AND col.trade.agencyno = '469'
GROUP BY _id
HAVING COUNT(DISTINCT col.shares.fundcode) > 2
  AND COUNT(DISTINCT col.trade.fundcode) > 2

我曾两次尝试$unwind$groupby$match汇总管道,但我没有得到正确答案。谢谢你的帮助。

1 个答案:

答案 0 :(得分:0)

提供的样本不符合条件并没有什么帮助,但当然只是因为"trade"数组只会产生2个不同的匹配,这不足以满足*“查询中大于2“的约束。

结构肯定与RDBMS不同,因此“子查询”不适用,但至少你制作了这些数组。但理想情况下,我们根本不会使用$unwind

因此,我们需要做的就是“计算”数组中的“不同”匹配。这基本上可以在使用$redact$map$setDifference作为主要操作的$size内应用:

db.getCollection('collection').aggregate([
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
    "trade.agencyno": "469"
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$gt": [
            { "$size": {
              "$setDifference": [
                { "$map": {
                  "input": "$shares",
                  "as": "el",
                  "in": {
                    "$cond": {
                      "if": { "$gt": [ "$$el.lastshares", 1000 ] },
                      "then": "$$el.fundcode",
                      "else": false
                    }
                  }
                }},
                [false]
              ]           
            }},
            2     
          ]},
          { "$gt": [
            { "$size": {    
              "$setDifference": [
                { "$map": {
                  "input": "$trade",
                  "as": "el",
                  "in": {
                    "$cond": {
                      "if": { "$eq": [ "$$el.agencyno", "469" ] },
                      "then": "$$el.fundcode",
                      "else": false
                    }  
                  }
                }},
                [false]
              ]  
            }},
            2
          ]}
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    } 
  }},
  /*
  { "$addFields": {
    "shares": {
      "$filter": {
        "input": "$shares",
        "as": "el",
        "cond": { "$gt": [ "$$el.lastshares", 1000 ] }
      }
    },
    "trade": {
      "$filter": {
        "input": "$trade",
        "as": "el",
        "cond": { "$eq": [ "$$el.agencyno", "469" ] }   
      }
    }
  }}
  */
])

这使得它基本上与MongoDB 2.6及更高版本兼容,并且只在那里添加$addFields,所以你至少可以看到“过滤器”的结果,但它不需要,因为那不是查询的内容在问题要求中,实际上“只是文档_id”,但只返回整个文档需要的工作量较少。如果你真的想要的话,最后只为_id添加$project

另外,为了品尝你可以使用$filter而不是MongoDB 3.x版本,但这种情况下的语法实际上要长一点:

db.getCollection('collection').aggregate([
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
    "trade.agencyno": "469"
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$gt": [
            { "$size": {
              "$setDifference": [
                { "$map": {
                  "input": { 
                    "$filter": {
                      "input": "$shares",
                      "as": "el",
                      "cond": { "$gt": [ "$$el.lastshares", 1000 ] }
                    }
                  },
                  "as": "el",
                  "in": "$$el.fundcode"
                }},
                []
              ]           
            }},
            2     
          ]},
          { "$gt": [
            { "$size": {    
              "$setDifference": [
                { "$map": {
                  "input": {
                    "$filter": {
                      "input": "$trade",
                      "as": "el",
                      "cond": { "$eq": [ "$$el.agencyno", "469" ] }   
                    }
                  },
                  "as": "el",
                  "in": "$$el.fundcode"       
                }},
                []
              ]  
            }},
            2
          ]}
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    } 
  }},
  /*
  { "$addFields": {
    "shares": {
      "$filter": {
        "input": "$shares",
        "as": "el",
        "cond": { "$gt": [ "$$el.lastshares", 1000 ] }
      }
    },
    "trade": {
      "$filter": {
        "input": "$trade",
        "as": "el",
        "cond": { "$eq": [ "$$el.agencyno", "469" ] }   
      }
    }
  }}
  */
])

这里的基本原则是:

 having (count(distinct fundcode))...

条件通过$size$setDifference实现“过滤”数组内容。实际上甚至不需要“GROUP BY”部分,因为“数组”已经表示“分组”形式的关系。将整个$redact声明视为“HAVING”。

如果您的MongoDB真的很古老而且您无法使用这些表单,那么$unwind仍然可以使用它。这次我们$addToSet获取“不同”条目:

db.getCollection('collection').aggregate([
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
    "trade.agencyno": "469"
  }},
  { "$unwind": "$shares" },
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
  }},
  { "$group": {
    "_id": "$_id",
    "shares": { "$addToSet": "$shares.fundcode" },
    "trade": { "$first": "$trade" }
  }},
  { "$unwind": "$trade" },
  { "$match": {
    "trade.agencyno": "469"      
  }},
  { "$group": {
    "_id": "$_id",
    "shares": { "$first": "$shares" },
    "trade": { "$addToSet": "$trade.fundcode" }  
  }},
  { "$match": {
    "shares.2": { "$exists": true },
    "trade.2": { "$exists": true }  
  }}
])

在这种情况下,“HAVING”由$match子句表示,其中诸如"shares.2": { "$exists": true }之类的符号实际上询问被测试的数组是否实际上具有“第三索引”,而意味着它有“大于两个”,这是条件的重点。

但该文件只有“两个”匹配

如上所述,如果您确实提供了与您要求的条件相符的文档,那么它会帮助您解决问题。遗憾的是,提供的文档未达到文档中"trade"数组所需的匹配数。

修复您的条件以匹配我们在$gte条件下2提供的"trade"所提供的文档:

db.getCollection('collection').aggregate([
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
    "trade.agencyno": "469"
  }},
  { "$redact": {
    "$cond": {
      "if": {
        "$and": [
          { "$gt": [
            { "$size": {
              "$setDifference": [
                { "$map": {
                  "input": "$shares",
                  "as": "el",
                  "in": {
                    "$cond": {
                      "if": { "$gt": [ "$$el.lastshares", 1000 ] },
                      "then": "$$el.fundcode",
                      "else": false
                    }
                  }
                }},
                [false]
              ]           
            }},
            2     
          ]},
          { "$gte": [
            { "$size": {    
              "$setDifference": [
                { "$map": {
                  "input": "$trade",
                  "as": "el",
                  "in": {
                    "$cond": {
                      "if": { "$eq": [ "$$el.agencyno", "469" ] },
                      "then": "$$el.fundcode",
                      "else": false
                    }  
                  }
                }},
                [false]
              ]  
            }},
            2
          ]}
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    } 
  }},
  { "$addFields": {
    "shares": {
      "$filter": {
        "input": "$shares",
        "as": "el",
        "cond": { "$gt": [ "$$el.lastshares", 1000 ] }
      }
    },
    "trade": {
      "$filter": {
        "input": "$trade",
        "as": "el",
        "cond": { "$eq": [ "$$el.agencyno", "469" ] }   
      }
    }
  }}
])

该形式的哪些输出为:

{
    "_id" : 1.0,
    "shares" : [ 
        {
            "fundcode" : "000001",
            "lastshares" : 1230.2,
            "agencyno" : "260",
            "netno" : "260"
        }, 
        {
            "fundcode" : "000002",
            "lastshares" : 213124.0,
            "agencyno" : "469",
            "netno" : "001"
        }, 
        {
            "fundcode" : "000003",
            "lastshares" : 10000.8,
            "agencyno" : "469",
            "netno" : "002"
        }
    ],
    "trade" : [ 
        {
            "fundcode" : "000002",
            "c_date" : "20160506",
            "agencyno" : "469",
            "netno" : "001",
            "bk_tradetype" : "122",
            "confirmbalance" : 213124.0,
            "cserialno" : "10110000119602",
            "status" : "1"
        }, 
        {
            "fundcode" : "000003",
            "c_date" : "20170507",
            "agencyno" : "469",
            "netno" : "002",
            "bk_tradetype" : "122",
            "confirmbalance" : 10000.8,
            "netvalue" : 1.0,
            "cserialno" : "10110000119602",
            "status" : "1"
        }
    ]
}

或者使用$unwind,放宽长度以测试2位置:

db.getCollection('collection').aggregate([
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
    "trade.agencyno": "469"
  }},
  { "$unwind": "$shares" },
  { "$match": {
    "shares.lastshares": { "$gt": 1000 },
  }},
  { "$group": {
    "_id": "$_id",
    "shares": { "$addToSet": "$shares.fundcode" },
    "trade": { "$first": "$trade" }
  }},
  { "$unwind": "$trade" },
  { "$match": {
    "trade.agencyno": "469"      
  }},
  { "$group": {
    "_id": "$_id",
    "shares": { "$first": "$shares" },
    "trade": { "$addToSet": "$trade.fundcode" }  
  }},
  { "$match": {
    "shares.2": { "$exists": true },
    "trade.1": { "$exists": true }  
  }}
])

返回:

{
    "_id" : 1.0,
    "shares" : [ 
        "000003", 
        "000002", 
        "000001"
    ],
    "trade" : [ 
        "000003", 
        "000002"
    ]
}

但是当然两者都标识了原始查询要求的条件的“文档”,因此无论返回的内容如何,​​它都是相同的基本结果。如果必须的话,你可以$project只考虑_id