按顺序对数组文档进行分组:MongoDB groupby或mapreduce?

时间:2019-07-11 19:07:17

标签: arrays mongodb mapreduce grouping pymongo-3.x

在mongodb中,我有一个文档集合,其中包含一组记录,我希望通过类似的标签对记录进行分组,以保持自然顺序

    {
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": ISODate("2019-01-07T09:06:56Z"),
                    "score": 1
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:06Z"),
                    "score": 0
                },
                {
                    "tag": "ou",
                    "unixTime": ISODate("2019-01-07T09:07:06Z"),
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:20Z"),
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:37Z"),
                    "score": 1
                }
         ]
  

我想通过类似的标签序列对记录进行分组(并汇总),而不仅仅是通过对唯一标签进行分组

所需的输出:

    {
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": [ISODate("2019-01-07T09:06:56Z")],
                    "score": 1
                    "nbRecords": 1
                },
                {
                    "tag": "u",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0,
                    "nbRecords":1
                },
                {
                    "tag": "ou",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime: [ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")]
                    "score": 1
                    "nbRecords":2
                }
         ]

分组依据

似乎mongodb中的'$ groupby'聚合运算符以前曾按唯一字段对数组和组进行排序

   db.coll.aggregate(
         [
           {"$unwind":"$records"},
           {"$group":
                   {
                       "_id":{ 
                           "tag":"$records.tag",
                           "day":"$day"
                        },
                       ...
                    }
            }
         ]
   )

返回

{
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": [ISODate("2019-01-07T09:06:56Z")],
                    "score": 1
                    "nbRecords": 1
                },
                {
                    "tag": "u",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z"),ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")],
                    "score": 2,
                    "nbRecords":3
                },
                {
                    "tag": "ou",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0
                },

         ]

地图/缩小

由于我当前正在使用pymongo驱动程序,因此我在python中实现了该解决方案 使用itertools.groupby作为生成器执行尊重自然顺序的分组,但是由于疯狂的时间处理,我面临服务器超时问题(cursor.NotFound Error)。

关于如何直接使用mongo的mapreduce功能的任何想法 执行与python中的itertools.groupby()等效的功能?

非常感谢您的帮助:我正在使用pymongo驱动程序3.8和MongoDB 4.0

2 个答案:

答案 0 :(得分:0)

Ni!在记录数组中运行,添加一个新的整数索引,每当groupby目标更改时,该索引就会递增,然后对该索引使用mongo操作。 。〜´

答案 1 :(得分:0)

在@Ale的推荐下,在MongoDb中没有做任何提示。我切换回解决cursor.NotFound问题的python实现。

我想我可以在Mongodb中完成工作,但这正在解决

for r in db.coll.find():
        session = [

        ]
        for tag, time_score in itertools.groupby(r["records"], key=lambda x:x["tag"]):
            time_score = list(time_score)
            session.append({
                "tag": tag, 
                "start": time_score[0]["unixTime"], 
                "end": time_score[-1]["unixTime"], 
                "ca": sum([n["score"] for n in time_score]), 
                "nb_records": len(time_score) 
            })
        db.col.update(
                {"_id":r["_id"]}, 
                {
                    "$unset": {"records": ""},
                    "$set":{"sessions": session}
                })