Question

我正在根据文件的最后访问时间（根据我在Mongo中拥有的数据）对文件集合进行分组。

但是我不确定如何从组操作中返回更多文档。

如何获取包含每个分组下所有信息的文档？

例如，我现在返回：

df=pd.DataFrame({'occurance':[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0],'value':[45, 3, 2, 12, 14, 32, 1, 1, 6, 4, 9, 32, 78, 96, 12, 6, 3],'group':[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 4, 100, 5, 5, 5, 5]})
df

我只能将文件名恢复到基本数组中。我想要的是文档，而不仅仅是名称。

我用来在上面创建结果的代码

bs=df[df.occurance.eq(1).any(1)&df.occurance.shift(-1).eq(0).any(1)].squeeze()
bs

我希望我的结果值更像：

[
  {
    "year": [
      2020
    ],
    "month": [
      2
    ],
    "week": [
      7
    ],
    "day": [
      2
    ],
    "results": [
      "filename-1",
      "filename-2"
    ]
  }
]

非常感谢您的帮助

Answer 1

将您的小组赛舞台更改为此：

GroupOperation group = Aggregation
    .group("year", "month", "week", "day")
    .addToSet(new Document("id", "$id")
                   .append("filename", "$name")
                   .append("size", "$size"))
    .as("results");

注意：请确保将size字段包括在dateProjection阶段。

MongoDB Group操作返回文档，而不是单个字段

1 个答案: