Question

我将文档存储到MongoDB中，如下所示：

#Mein Erstes Gui Python Programm mit Tkinter
#Created: July,2017
#Creator: Yuto
from tkinter import *

#class für den Inhalt des Windows z.b. label
class WindowInhalt():
    def label(self):
        label = Label(self.tkWindow, text="What the fuck", fg="black",bg="lightyellow", font=('Arial', 14))
        label.bind("<Button-1>", EventsBinding.Test)
        label.place(x=300, y=50, width="200", height="20")


class EventsBinding(WindowInhalt):
    def Test(self, event):
        print("gedrückt")


#class für das Window an sich hier wird dann auch z.b. Inhalt eingebunden
class Window(WindowInhalt):
    def __init__(self):
        super().__init__()
        self.tkWindow = Tk()
        self.label()
        self.windowSettings()

    #settings für das window z.b. größe
    def windowSettings(self):
        self.tkWindow.configure(background="lightyellow")
        self.tkWindow.title("GUI LALALLALALA")
        self.tkWindow.wm_geometry("800x400+600+300")
        self.tkWindow.mainloop()


#Only ausführen wenn es nicht eingebunden ist
if __name__ == "__main__":
    print("starten")
    w = Window()
else:
    print("Dise Datei bitte nicht einbinden!")

我想在白天给出inDate 和 outDate的总和。

我可以在{ "_id" : "XBpNKbdGSgGfnC2MJ", "po" : 72134185, "machine" : 40940, "location" : "02A01", "inDate" : ISODate("2017-07-19T06:10:13.059Z"), "requestDate" : ISODate("2017-07-19T06:17:04.901Z"), "outDate" : ISODate("2017-07-19T06:30:34Z") }天之前检索文件总数的两面，另一方面可以通过inDate检索文件总数，但我想要各自的总和。

目前，我使用此管道：

outDate

我给：

      $group: {
        _id: {
          yearA: { $year: '$inDate' },
          monthA: { $month: '$inDate' },
          dayA: { $dayOfMonth: '$inDate' },
        },
        count: { $sum: 1 },
      },

但我想，如果有可能：

{ "_id" : { "year" : 2017, "month" : 7, "day" : 24 }, "count" : 1 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 21 }, "count" : 11 }
{ "_id" : { "year" : 2017, "month" : 7, "day" : 19 }, "count" : 20 }

有什么想法吗？非常感谢： - ）

Answer 1

您还可以在源处拆分文档，基本上将每个值组合成一个条目数组，“type”表示“in”和“out”。您只需使用$map和$cond选择字段，然后$unwind数组，然后通过使用$cond检查再次“计算”哪个字段来执行此操作：< / p>

collection.aggregate([
  { "$project": {
    "dates": {
      "$filter": {
        "input": { 
          "$map": {
            "input": [ "in", "out" ],
            "as": "type",
            "in": {
              "type": "$$type",
              "date": {
                "$cond": {
                  "if": { "$eq": [ "$$type", "in" ] },
                  "then": "$inDate",
                  "else": "$outDate"
                }
              }
            }
          }
        },
        "as": "dates",
        "cond": { "$ne": [ "$$dates.date", null ] }
      }
    }
  }},
  { "$unwind": "$dates" },
  { "$group": {
    "_id": {
      "year": { "$year": "$dates.date" },
      "month": { "$month": "$dates.date" },
      "day": { "$dayOfMonth": "$dates.date" }
    },
    "countIn": {
      "$sum": {
        "$cond": {
          "if": { "$eq": [ "$dates.type", "in" ]  },
          "then": 1,
          "else": 0
        }
      }
    },
    "countOut": {
      "$sum": {
        "$cond": {
          "if": { "$eq": [ "$dates.type", "out" ]  },
          "then": 1,
          "else": 0
        }
      }
    }
  }}
])

这是一种安全的方法，无论您发送的数据大小如何，都不会有违反BSON限制的风险。

就个人而言，我宁愿作为单独的进程运行并单独“合并”聚合结果，但这取决于您运行的环境，问题中未提及。

对于“并行”执行的示例，您可以在Meteor的某处沿着这些线构建：

import { Meteor } from 'meteor/meteor';
import { Source } from '../imports/source';
import { Target } from '../imports/target';

Meteor.startup(async () => {
  // code to run on server at startup

  await Source.remove({});
  await Target.remove({});

  console.log('Removed');

  Source.insert({
    "_id" : "XBpNKbdGSgGfnC2MJ",
    "po" : 72134185,
    "machine" : 40940,
    "location" : "02A01",
    "inDate" : new Date("2017-07-19T06:10:13.059Z"),
    "requestDate" : new Date("2017-07-19T06:17:04.901Z"),
    "outDate" : new Date("2017-07-19T06:30:34Z")
  });

  console.log('Inserted');

  await Promise.all(
    ["In","Out"].map( f => new Promise((resolve,reject) => {
      let cursor = Source.rawCollection().aggregate([
        { "$match": { [`${f.toLowerCase()}Date`]: { "$exists": true } } },
        { "$group": {
          "_id": {
            "year": { "$year": `$${f.toLowerCase()}Date` },
            "month": { "$month": `$${f.toLowerCase()}Date` },
            "day": { "$dayOfYear": `$${f.toLowerCase()}Date` }
          },
          [`count${f}`]: { "$sum": 1 }
        }}
      ]);

      cursor.on('data', async (data) => {
        cursor.pause();
        data.date = data._id;
        delete data._id;
        await Target.upsert(
          { date: data.date },
          { "$set": data }
        );
        cursor.resume();
      });

      cursor.on('end', () => resolve('done'));
      cursor.on('error', (err) => reject(err));
    }))
  );

  console.log('Mapped');

  let targets = await Target.find().fetch();
  console.log(targets);

});

本质上将输出到目标集合，如评论中提到的那样：

{
        "_id" : "XdPGMkY24AcvTnKq7",
        "date" : {
                "year" : 2017,
                "month" : 7,
                "day" : 200
        },
        "countIn" : 1,
        "countOut" : 1
}

Answer 2

Riiiight。我想出了以下查询。不可否认，我在生活中看到过更简单，更好的东西，但它确实完成了工作：

db.getCollection('test').aggregate
(
  {
    $facet: // split aggregation into two pipelines
    {
      "in": [
        { "$match": { "inDate": { "$ne": null } } }, // get rid of null values
        { $group: { "_id": { "y": { "$year": "$inDate" }, "m": { "$month": "$inDate" }, "d": { "$dayOfMonth": "$inDate" } }, "cIn": { $sum : 1 } } }, // compute sum per inDate
      ],
      "out": [
        { "$match": { "outDate": { "$ne": null } } }, // get rid of null values
        { $group: { "_id": { "y": { "$year": "$outDate" }, "m": { "$month": "$outDate" }, "d": { "$dayOfMonth": "$outDate" } }, "cOut": { $sum : 1 } } }, // compute sum per outDate
      ]
    }
  },
  { $project: { "result": { $setUnion: [ "$in", "$out" ] } } }, // merge results into new array
  { $unwind: "$result" }, // unwind array into individual documents
  { $replaceRoot: { newRoot: "$result" } }, // get rid of the additional field level
  { $group: { _id: { year: "$_id.y", "month": "$_id.m", "day": "$_id.d" }, "countIn": { $sum: "$cIn" }, "countOut": { $sum: "$cOut" } } } // group into final result
)

与MongoDB聚合一样，您可以通过从查询结束开始逐步减少投影阶段来了解所发生的事情。

修改

正如您在下面的评论中所看到的，围绕文档大小限制和此解决方案的一般适用性进行了一些讨论。

因此，让我们更详细地研究这些方面，并让我们将基于$facet的解决方案的性能与基于$map的解决方案的性能进行比较（由@NeilLunn建议）避免潜在的文件大小问题。）

我创建了200万条测试记录，这些记录的随机日期分配给了＆＃34; inDate＆＃34; ＆＃34; outDate＆＃34;字段：

{ "_id" : ObjectId("597857e0fa37b3f66959571a"), "inDate" : ISODate("2016-07-29T22:00:00.000Z"), "outDate" : ISODate("1988-07-14T22:00:00.000Z") }

所涵盖的数据范围是从1970年1月1日到2001年1月1日，共计29220个不同的日子。鉴于在这个时间范围内随机分配了200万个测试记录，两个查询都可以返回完整的29220个可能结果（两者都有）。

然后我在重新启动我的单个MongoDB实例后再次运行了两次查询，结果以毫秒为单位看起来像这样：

$facet：5663,5400,5380,5460,5520

$map：9648,9134,9058,9085,9132

我也是{face>阶段返回的单个文档的measured the size 3.19MB ，因此远离MongoDB文档大小限制（撰写本文时为16MB），但是，无论如何只适用于结果文档，在管道处理过程中不会出现问题。

底线：如果您想要性能，请使用此处建议的解决方案。但是，请注意文档大小限制，特别是如果您的用例不是上述问题中描述的用例（例如，当您需要收集更多/更大的数据时）。此外，我不确定在分片情况下，两种解决方案是否仍然暴露出相同的性能特征......

按日分组多个日期字段

2 个答案: