我有一个Mongo集合,其中包含以下文档:
!/module
我想从此集合中获取一些聚合数据。我想知道最旧的时间戳,文档计数和所有值的总strlen,但是按topic_id分组,其中document-id大于x。
在mysql中,我会构建一个这样的SQL:
{
"_id" : ObjectId("5a9d0d44c3a1ce5f14c6940a"),
"topic_id" : "5a7af30613b79405643e7da1",
"value" : "VMware Virtual Platform",
"timestamp" : "2018-03-05 09:26:25.136546",
"insert_ts" : "2018-03-05 09:26:25.136682",
"inserted_by" : 1
},
{
"_id" : ObjectId("5a9d0d44c3a1ce5f14c69409"),
"topic_id" : "5a7af30713b79479f82b4b84",
"value" : "VMware, Inc.",
"timestamp" : "2018-03-05 09:26:25.118931",
"insert_ts" : "2018-03-05 09:26:25.119081",
"inserted_by" : 1
},
{
"_id" : ObjectId("5a9d0d44c3a1ce5f14c69408"),
"topic_id" : "5a7af30713b7946d6d0a8772",
"value" : "Phoenix Technologies LTD 6.00 09/21/2015",
"timestamp" : "2018-03-05 09:26:25.101624",
"insert_ts" : "2018-03-05 09:26:25.101972",
"inserted_by" : 1
}
我如何在MongoDB中实现这一目标?我已经尝试构建它看起来像这样:
SELECT
MAX(_id) as max_id,
COUNT(*) as message_count,
MIN(timestamp) as min_timestamp,
LENGTH(GROUP_CONCAT(value)) as size
FROM `dev_topic_data_numeric`
WHERE _id > 22000
GROUP BY topic_id
然后我取消注释db.getCollection('topic_data_text').aggregate(
[
{
"$match":
{
"_id": {"$gte": ObjectId("5a9d0aefc3a1ce5f14c68c81") }
}
},
{
"$group":
{
"_id": "$topic_id",
"max_id": {"$max":"$_id"},
"min_timestamp": {"$min": "$timestamp"},
"message_count": {"$sum": 1},
/*"size": {"$strLenBytes": "$value" }*/
}
}
]
);
它崩溃说,strLenBytes不是组操作符。 API of MongoDB对我没有帮助。如何编写它来获取strlen?
我的预期结果应如下所示:
$strLenBytes
我的MongoDB版本是3.4.4。
答案 0 :(得分:1)
这是因为$strLenBytes
不是累加器,与$sum
或$max
不同。 $group
阶段累积值,因此在$group
阶段有效的任何运算符通常都是累加器。
$strLenBytes
以1-1的方式将一个值转换为另一个值。这通常是$project
阶段的运算符。
在聚合中添加$project
阶段应该会为您提供所需的结果。请注意,您还需要稍微修改$group
阶段以传递所需的值:
> db.test.aggregate([
{
"$match":
{
"_id": {"$gte": ObjectId("5a9d0aefc3a1ce5f14c68c81") }
}
},
{
"$group":
{
"_id": {"topic_id": "$topic_id", value: "$value"},
"max_id": {"$max":"$_id"},
"min_timestamp": {"$min": "$timestamp"},
"message_count": {"$sum": 1}
}
},
{
"$project":
{
"_id": "$_id.topic_id",
"max_id": "$max_id",
"min_timestamp": "$min_timestamp",
"message_count": "$message_count",
size: {"$strLenBytes": "$_id.value" }
}
}
])
使用示例文档输出:
{
"_id": "5a7af30613b79405643e7da1",
"max_id": ObjectId("5a9d0d44c3a1ce5f14c6940a"),
"min_timestamp": "2018-03-05 09:26:25.136546",
"message_count": 1,
"size": 23
}
{
"_id": "5a7af30713b79479f82b4b84",
"max_id": ObjectId("5a9d0d44c3a1ce5f14c69409"),
"min_timestamp": "2018-03-05 09:26:25.118931",
"message_count": 1,
"size": 12
}
{
"_id": "5a7af30713b7946d6d0a8772",
"max_id": ObjectId("5a9d0d44c3a1ce5f14c69408"),
"min_timestamp": "2018-03-05 09:26:25.101624",
"message_count": 1,
"size": 40
}
答案 1 :(得分:0)
在测试了@ kevin-adistambha的答案并进行了一些进一步的实验之后,我找到了另一种方法来实现我想要的结果 - 也许它有更好的性能 - 但是需要更多的测试来确保这一点。
db.getCollection('topic_data_text').aggregate(
[
{
"$match":
{
"_id": {"$gt": ObjectId("5a9f9d8bd5de3ac75f8cc269") }
}
},
{
"$group":
{
"_id": "$topic_id",
"max_id": {"$max":"$_id"},
"min_timestamp": {"$min": "$timestamp"},
"message_count": {"$sum": 1},
"size": {"$sum": {"$strLenBytes": "$value"}}
}
}
]
);