mongo(aggregate)如何使用2个交叉ID过滤对

时间:2017-05-06 12:02:50

标签: mongodb aggregation-framework

这是我的用例:

我从两个集合中执行连接聚合(使用$ lookup)A& B. 结果(在$ unwind之后)包含大量文档,每个文档包含来自连接的一对相关文档。由于这些对是自反的,我们得到如下的对:(_ id1,ns:[_ id2])和(_id2,ns [_id1])。

=>问题是如何摆脱其中一个等价对

让我们详细说明整个过程:

A = db.A;
A.drop();

nobjects = 100; window = 180.0;

for (i = 0; i < nobjects; i++) { obj = {'loc': [ (_rand()*2*window - window), (_rand()*2*window - window) ] }; A.insert(obj); }

print('A count=', A.count());
A.createIndex({ 'loc': '2d'});

ra = 0.; decl = 0.; ext = 50.0; bottomleft = [ ra - ext, decl - ext ]; topright = [ ra + ext, decl + ext ];

B = db.B;
B.drop();

p1 = [{$geoNear: {near: [0, 0], query: { 'loc': { $geoWithin: {$box: [bottomleft, topright] } } }, distanceField: 'dist', }}, {$out: 'B'},];

A.aggregate(p1, {allowDiskUse: true} );

print('B count=', B.count());

dx =   { $abs: {$subtract: [ {$arrayElemAt: ['$ns.loc', 0]}, {$arrayElemAt: ['$loc', 0]}] } };
dx2 =  { $multiply: [dx, dx] };
dy =   { $abs: {$subtract: [ {$arrayElemAt: ['$ns.loc', 1]}, {$arrayElemAt: ['$loc', 1]}] } };
dy2 =  { $multiply: [dy, dy] };
dist = { $sqrt:  { $add: [ dx2, dy2] } };

p2 = [ {$geoNear: { near: [0, 0], query: { loc: { $geoWithin: {$box: [bottomleft, topright] } } }, limit: nobjects, distanceField: 'dist', } },
       {$lookup: {'from':'B', localField:'A.loc', foreignField:'B.loc', as:'ns'} },
       {$unwind: '$ns'},
       {$redact: { $cond: [{ $eq: ['$_id', '$ns._id'] }, '$$PRUNE', '$$KEEP' ] } },
       {$addFields: {dist: dist} },
       {$project: {'_id': 1, 'ns._id':1}},
     ];

cursor = A.aggregate(p2, {allowDiskUse: true});

运行此脚本会导致:

> load('j.js')
A count= 100
B count= 6
true
> cursor
{ "_id" : ObjectId("5911c0445562f6b9815f9db5"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db2") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db5"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc0") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db5"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db6") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db5"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc8") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db5"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9da4") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db2"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db5") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db2"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc0") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db2"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db6") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db2"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc8") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db2"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9da4") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9dc0"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db5") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9dc0"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db2") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9dc0"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db6") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9dc0"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc8") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9dc0"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9da4") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db6"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db5") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db6"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9db2") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db6"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc0") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db6"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9dc8") } }
{ "_id" : ObjectId("5911c0445562f6b9815f9db6"), "ns" : { "_id" : ObjectId("5911c0445562f6b9815f9da4") } }

精确地看,结果包含如下文件:

{_id: 'aaa', ns: {_id:'bbb'} }
...
{_id: 'bbb', ns: {_id:'aaa'} }
...

当然这两个文件(id1,ns(id2))&amp; (id2,ns(id1))是等价的(因为这些对不需要被排序),因此我们只需要这两个中的一个。

我该怎么做这个过滤?

非常感谢任何建议

基督教

1 个答案:

答案 0 :(得分:0)

A.locB.loc不是有效字段,也是跨产品的原因。

您的$lookup应为{$lookup: {'from':'B', localField:'loc', foreignField:'loc', as:'ns'} }。这将导致左连接应该是。

因此,根据示例,聚合应该只在100个文档中生成(在$lookup阶段之后)(将geo部分放在一边),其中一些将在{{1}中匹配}和非匹配作为ns []数组返回。

所以查询的第二部分将是

ns

{ $lookup: {'from':'B', localField:'loc', foreignField:'loc', as:'ns'} } { $unwind: '$ns'}, //will also drop the `[]` ns { $redact: { $cond: [{ $gt: ['$_id', '$ns._id'] }, '$$PRUNE', '$$KEEP' ] } }, { $addFields: {dist: dist} }, { $project: {'_id': 1, 'ns._id':1} } 将负责放弃反身对中的一个。