我的目标是返回多个questionElements,其中questionElements metaTag条目等于我的搜索。例如。如果metaTag元素等于我的字符串,则返回它的父questionEntry元素并搜索嵌套在show中的所有元素。
所以我想要的是匹配包含所需" metaTags"的文档。值,AND"过滤"任何不包含此内部匹配的子文档数组
这是我作为$redact
的聚合查询尝试过的,但它没有给出我想要的结果:
db.mongoColl.aggregate([{"$redact":{"$cond": { if: {$gt:[ {"$size": {
$setIntersection : [ { "$ifNull": [ "$metaTags", []]},
["MySearchString"]]} } , 0 ]} , then:"$$PRUNE",
else:"$$DESCEND" }}}]).pretty();
我的背景是:
private DB mongoDatabase;
private DBCollection mongoColl;
private DBObject dbObject;
// Singleton class
// Create client (server address(host,port), credential, options)
mongoClient = new MongoClient(new ServerAddress(host, port),
Collections.singletonList(credential),
options);
mongoDatabase = ClientSingleton.getInstance().getClient().getDB("MyDB");
我要匹配的数据库中的文件是:
{
"show":[
{
"season":[
{
"episodes":[
{
"questionEntry":{
"id":1,
"info":{
"seasonNumber":1,
"episodeNumber":5,
"episodeName":"A Hero Sits Next Door"
},
"questionItem":{
"theQuestion":"What is the name of the ringer hired by Mr. Weed?",
"attachedElement":{
"type":1,
"value":""
}
},
"options":[
{
"type":1,
"value":"Johnson"
},
{
"type":1,
"value":"Hideo"
},
{
"type":1,
"value":"Guillermo"
}
],
"answer":{
"questionId":1,
"answer":3
},
"metaTags":[
"Season 1",
"Episode 5",
"Trivia",
"Arya Stark",
"House Stark"
]
}
}
]
}
]
}
]
}
但是,如果文档中的任何数组不包含" metaTags"要匹配的值,即Arya Stark",然后我不希望该数组中的任何元素在结果中完全匹配。 " metaTags"可以保持原样。
我正在运行最新的java驱动程序并在Eclipse中使用java SE1.7编译器,如果这对响应有任何影响。
答案 0 :(得分:2)
$redact
运算符实际上并不是最好的选择,或者逻辑很简单,是尝试查询无效的主要原因。 “redaction”选项在单个特定条件下几乎是“全有或全无”过程,并且该条件可用于$$DESCEND
,因此遍历文档的级别。
最好通过转换不存在编码字段的值来获得大量“误报”。在最坏的情况下,你最终删除整个文档,而不是它可以匹配。它有它的用途,但这不是其中之一。
首先根据您的结构简化样本。这主要是为了能够将我们想要从内容中过滤的内容可视化:
{
"show": [
{
"name": "Game of Thrones",
"season": [
{
"_id": 1,
"episodes": [
{
"_id": 1,
"metaTags": [
"Arya Stark"
]
},
{
"_id": 2,
"metaTags": [
"John Snow"
]
}
]
},
{
"_id": 2,
"episodes": [
{
"_id": 1,
"metaTags": [
"Arya Stark"
]
}
]
}
]
},
{
"name": "Seinfeld",
"season": [
{
"_id": 1,
"episodes": [
{
"_id": 1,
"metaTags": [
"Jerry Seinfeld"
]
}
]
}
]
}
]
}
有两种方法可以在这里获得结果。首先使用$unwind
来使用$match
进行传统方法,然后使用$group
和条件表达式进行过滤,当然还有几个$map
次操作阶段,以便重建阵列:
db.sample.aggregate([
{ "$match": {
"show.season.episodes.metaTags": "Arya Stark"
}},
{ "$unwind": "$show" },
{ "$unwind": "$show.season" },
{ "$unwind": "$show.season.episodes" },
{ "$unwind": "$show.season.episodes.metaTags" },
{ "$group": {
"_id": {
"_id": "$_id",
"show": {
"name": "$show.name",
"season": {
"_id": "$show.season._id",
"episodes": {
"_id": "$show.season.episodes._id",
}
}
}
},
"metaTags": { "$push": "$show.season.episodes.metaTags" },
"matched": {
"$sum": {
"$cond": [
{ "$eq": [ "$show.season.episodes.metaTags", "Arya Stark" ] },
1,
0
]
}
}
}},
{ "$sort": { "_id._id": 1, "_id.show.season.episodes._id": 1 } },
{ "$group": {
"_id": {
"_id": "$_id._id",
"show": {
"name": "$_id.show.name",
"season": {
"_id": "$_id.show.season._id",
},
}
},
"episodes": {
"$push": {
"$cond": [
{ "$gt": [ "$matched", 0 ] },
{
"_id": "$_id.show.season.episodes._id",
"metaTags": "$metaTags"
},
false
]
}
}
}},
{ "$unwind": "$episodes" },
{ "$match": { "episodes": { "$ne": false } } },
{ "$group": {
"_id": "$_id",
"episodes": { "$push": "$episodes" }
}},
{ "$sort": { "_id._id": 1, "_id.show.season._id": 1 } },
{ "$group": {
"_id": {
"_id": "$_id._id",
"show": {
"name": "$_id.show.name"
}
},
"season": {
"$push": {
"_id": "$_id.show.season._id",
"episodes": "$episodes"
}
}
}},
{ "$group": {
"_id": "$_id._id",
"show": {
"$push": {
"name": "$_id.show.name",
"season": "$season"
}
}
}}
])
这一切都很好,很容易理解。但是在这里使用$unwind
的过程会产生很多开销,特别是当我们只讨论文档本身的过滤,而不是跨文档进行任何分组时。
对此有一种现代的方法,但要注意的是,虽然效率很高,但它是一个绝对的“怪物”,在处理嵌入式数组时很容易迷失在逻辑中:
db.sample.aggregate([
{ "$match": {
"show.season.episodes.metaTags": "Arya Stark"
}},
{ "$project": {
"show": {
"$setDifference": [
{ "$map": {
"input": "$show",
"as": "show",
"in": {
"$let": {
"vars": {
"season": {
"$setDifference": [
{ "$map": {
"input": "$$show.season",
"as": "season",
"in": {
"$let": {
"vars": {
"episodes": {
"$setDifference": [
{ "$map": {
"input": "$$season.episodes",
"as": "episode",
"in": {
"$cond": [
{ "$setIsSubset": [
"$$episode.metaTags",
["Arya Stark"]
]},
"$$episode",
false
]
}
}},
[false]
]
}
},
"in": {
"$cond": [
{ "$ne": [ "$$episodes", [] ] },
{
"_id": "$$season._id",
"episodes": "$$episodes"
},
false
]
}
}
}
}},
[false]
]
}
},
"in": {
"$cond": [
{ "$ne": ["$$season", [] ] },
{
"name": "$$show.name",
"season": "$$season"
},
false
]
}
}
}
}},
[false]
]
}
}}
])
对于每个数组,有$let
和每个级别以及$setDifference
的变量声明都有很多数组处理,因为我们都是通过$project
“过滤”内容和测试空数组。
在初始查询匹配后使用单个管道$setIsSubset
,这比前一个过程快得多。
两者都产生相同的过滤结果:
{
"_id" : ObjectId("55b3455e64518e494632fa16"),
"show" : [
{
"name" : "Game of Thrones",
"season" : [
{
"_id" : 1,
"episodes" : [
{
"_id" : 1,
"metaTags" : [
"Arya Stark"
]
}
]
},
{
"_id" : 2,
"episodes" : [
{
"_id" : 1,
"metaTags" : [
"Arya Stark"
]
}
]
}
]
}
]
}
所有“show”,“season”和“episodes”数组都完全过滤了与内部“metaTags”条件不匹配的任何文档。 “metaTags”数组本身不受影响,仅通过BSON Document进行匹配测试,实际上只是为了过滤不匹配的“episodes”数组内容。
将其转换为使用是Java驱动程序是一个相当直接的过程,因为这只是对象和列表的数据结构表示。在同一个wat中,您只需使用标准列表和对象在Java中构建相同的结构。但它基本上都是列表和地图语法:
MongoDatabase db = mongoClient.getDatabase("test");
MongoCollection<Document> collection = db.getCollection("sample");
String searchString = new String("Arya Stark");
List<Document> pipeline = Arrays.<Document>asList(
new Document("$match",
new Document("show.season.episodes.metaTags",searchString)
),
new Document("$project",
new Document("show",
new Document("$setDifference",
Arrays.<Object>asList(
new Document("$map",
new Document("input","$show")
.append("as","show")
.append("in",
new Document("$let",
new Document("vars",
new Document("season",
new Document("$setDifference",
Arrays.<Object>asList(
new Document("$map",
new Document("input","$$show.season")
.append("as","season")
.append("in",
new Document("$let",
new Document("vars",
new Document("episodes",
new Document("$setDifference",
Arrays.<Object>asList(
new Document("$map",
new Document("input","$$season.episodes")
.append("as","episode")
.append("in",
new Document("$cond",
Arrays.<Object>asList(
new Document("$setIsSubset",
Arrays.<Object>asList(
"$$episode.metaTags",
Arrays.<Object>asList(searchString)
)
),
"$$episode",
false
)
)
)
),
Arrays.<Object>asList(false)
)
)
)
)
.append("in",
new Document("$cond",
Arrays.<Object>asList(
new Document("$ne",
Arrays.<Object>asList(
"$$episodes",
Arrays.<Object>asList()
)
),
new Document("_id","$$season._id")
.append("episodes","$$episodes"),
false
)
)
)
)
)
),
Arrays.<Object>asList(false)
)
)
)
)
.append("in",
new Document("$cond",
Arrays.<Object>asList(
new Document("$ne",
Arrays.<Object>asList(
"$$season",
Arrays.<Object>asList()
)
),
new Document("name","$$show.name")
.append("season","$$season"),
false
)
)
)
)
)
),
Arrays.<Object>asList(false)
)
)
)
)
);
System.out.println(JSON.serialize(pipeline));
AggregateIterable<Document> result = collection.aggregate(pipeline);
MongoCursor<Document> cursor = result.iterator();
while (cursor.hasNext()) {
Document doc = cursor.next();
System.out.println(doc.toJson());
}
如前所述,这是语法的“怪物”,它应该能够深入了解处理文档中多级嵌套数组的难度。除了单元数组之外的任何东西都很难处理,并且由于位置运算符的限制,基本上不可能执行原子更新。
所以这将起作用,你真的只需要添加“metaTags”嵌入在“questionEntry”对象中。所以用“questionEntry.metaTags”代替那里的任何东西。但是,您可以考虑从此表单中更改模式,以便在大量编码和维护中使生活更轻松,并使事物可用于原子更新。
答案 1 :(得分:0)
您可以使用以下代码进行汇总:
mongoClient = new MongoClient("127.0.0.1", 27017);
DB db = mongoClient.getDB("db_name");
DBCollection dbCollection = db.getCollection("collection_name");
//make aggregation pipeline here
List<DBObject> pipeline = new ArrayList<DBObject>();
AggregationOutput output = dbCollection.aggregate(pipeline);
List<DBObject> results = (List<DBObject>) output.results();
//iterate this list and cast DBObject to your POJO
您可以DBObject
投降至POJO
或使用以下方法从DBObject
获取价值:
dbObject.get("key");