聚合以获取数组中键值对的总数,对类似名称的字段值进行分组

时间:2015-09-15 02:55:18

标签: mongodb mongodb-query aggregation-framework

我的文档结构如下,每个数组元素包含" n"和" v"作为不同类型数据的关键和价值。我需要通过" n" " ipaddress"的值并从集合中计算总的不同组合。但是,值相似但不相同。 (例如:ip,ip_addr和ipaddr)

> db.final.find().pretty()
{
        "_id" : 2,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "1"
                },
                {
                        "n" : "ip",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "tcp"
                },
                {
                        "n" : "port",
                        "v" : "13438"
                }
        ]
}
{
        "_id" : 5,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "1"
                },
                {
                        "n" : "ip",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "tcp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 1,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "2"
                },
                {
                        "n" : "ip_addr",
                        "v" : "2.2.2.2"
                },
                {
                        "n" : "pro",
                        "v" : "udp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 3,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "NW"
                },
                {
                        "n" : "logtype",
                        "v" : "3"
                },
                {
                        "n" : "ipaddr",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "tcp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 4,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "LA"
                },
                {
                        "n" : "logtype",
                        "v" : "3"
                },
                {
                        "n" : "ipaddr",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "udp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}
{
        "_id" : 6,
        "props" : [
                {
                        "n" : "loc",
                        "v" : "LA"
                },
                {
                        "n" : "logtype",
                        "v" : "1"
                },
                {
                        "n" : "ip",
                        "v" : "1.1.1.1"
                },
                {
                        "n" : "pro",
                        "v" : "udp"
                },
                {
                        "n" : "port",
                        "v" : "53"
                }
        ]
}

查询选择条件如下:

  1. if" loc"是" NW"和" logtype"是" 1"然后" ipaddress" =" ip"
  2. if" loc"是" NW"和" logtype"是" 2"然后" ipaddress" =" ip_addr"
  3. if" loc"是" NW"和" logtype"是" 3"然后" ipaddress" =" ipaddr"
  4. port is" 53"
  5. 亲是' udp'或者&#t; tcp'
  6. 分组" ipaddress"
  7. 我想要这样的结果。

    {"ipaddress" : "2.2.2.2" , count : 1}
    {"ipaddress" : "1.1.1.1" , count : 2}
    

    这是我到目前为止所做的:

    db.final.aggregate([
        { "$match": {
            "$and": [
                {"props" : {"$elemMatch": { "n": "port", "v": "53" }}},
                {"props" : {"$elemMatch": { "n": "pro", "v": {"$in" : [/udp/, /tcp/]} }}}
            ]
        }},
        { "$unwind": "$props" },
            {
            "$project": {
                "_ipaddress": {
                    "$cond": {
                        "if": { "$eq": [ "$props.n", "ip" ] },
                        "then": "$props.v",
                        "else": {
                            "$cond": {
                                "if": { "$eq": [ "$props.n", "ip_addr" ] },
                                "then": "$props.v",
                                "else": {
                                    "$cond" : {
                                        "if": { "$eq": [ "$props.n", "ipaddr" ] },
                                        "then": "$props.v",
                                        "else" : 0
                                    }
                                }
                            }
                        }
                    }
                },
                "_id": 1,
                "props" : 1
            }
        },
        { "$group": {
            "_id": "$_id",
            "_ipaddress": {
                "$min": {
                    "$cond": [ { "$ne": [ "$_ipaddress", 0 ] }, "$_ipaddress", false ]
                }
            },
            "pro": {
                "$min": {
                    "$cond": [ { "$eq": [ "$props.n", "pro" ] }, "$props.v", false ]
                }
            },
            "logtype": {
                "$min": {
                    "$cond": [ { "$eq": [ "$props.n", "logtype" ] }, "$props.v", false ]
                }
            },
            "port": {
                "$min": {
                    "$cond": [ { "$eq": [ "$props.n", "port" ] }, "$props.v", false ]
                }
            }
        } },
            { "$group": {
            "_id": {
                "_ipaddress": "$_ipaddress",
            },
            "count": { "$sum": 1 }
        }}
    ])
    

    但我不知道如何结合" loc"和" logtype"条件。

1 个答案:

答案 0 :(得分:0)

此处每个文档都有一个数组,共同存储相关信息。此设计不适合您正在尝试的查询。 根据我的理解,$ unwind在这里无济于事,因为它会拆分数组元素。

我发现的解决方案是将数组元素设置为键:值对并成功扩展。

> db.final.aggregate([
 ... {"$project" : {"pro" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "pro" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1}},
 ... {"$project" : {"loc" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "loc" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1}},
 ... {"$project" : {"port" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "port" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1,loc:1}},
 ... {"$project" : {"ip" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "ip" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1,loc:1,port:1}},
 ... {"$project" : {"ip_addr" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "ip_addr" ] }, "$$array_elem.v", null]}}}, [null] ] },props:1,pro:1,loc:1,port:1,ip:1}},
 ... {"$project" : {"ipaddr" : { "$setDifference" :[{"$map" : {"input" : "$props","as" : "array_elem", "in" : { "$cond" : [ { "$eq" : [ "$$array_elem.n", "ipaddr" ] }, "$$array_elem.v", null]}}}, [null] ] },pro:1,loc:1,port:1,ip:1,ip_addr:1}}
 ... ])

//Result(I have used data given in your question):
{ "_id" : 2, "pro" : [ "tcp" ], "loc" : [ "NW" ], "port" : [ "13438" ], "ip" : [ "1.1.1.1" ], "ip_addr" : [ ], "ipaddr" : [ ] }
{ "_id" : 5, "pro" : [ "tcp" ], "loc" : [ "NW" ], "port" : [ "53" ], "ip" : [ "1.1.1.1" ], "ip_addr" : [ ], "ipaddr" : [ ] }
{ "_id" : 1, "pro" : [ "udp" ], "loc" : [ "NW" ], "port" : [ "53" ], "ip" : [ ], "ip_addr" : [ "2.2.2.2" ], "ipaddr" : [ ] }
{ "_id" : 3, "pro" : [ "tcp" ], "loc" : [ "NW" ], "port" : [ "53" ], "ip" : [ ], "ip_addr" : [ ], "ipaddr" : [ "1.1.1.1" ] }
{ "_id" : 4, "pro" : [ "udp" ], "loc" : [ "LA" ], "port" : [ "53" ], "ip" : [ ], "ip_addr" : [ ], "ipaddr" : [ "1.1.1.1" ] }
{ "_id" : 6, "pro" : [ "udp" ], "loc" : [ "LA" ], "port" : [ "53" ], "ip" : [ "1.1.1.1" ], "ip_addr" : [ ], "ipaddr" : [ ] }

这里Array中的每个元素都是键和值。你可以在这些文档上应用条件并得到想要的结果。我没有在这里给出完整的答案,因为你已经完成了它。

注意:

  1. MongoDB不允许在聚合中使用$ elemMatch(https://jira.mongodb.org/browse/SERVER-14876)并且用于此的解决方法是增加聚合管道的大小。
  2. 如果数组中元素的位置是可靠的,它可以简单地用于将数组元素转换为键:值对。