找到$和 - 很长时间才能完成

时间:2017-02-25 04:39:40

标签: mongodb mongodb-query aggregation-framework

我使用的是MongoDb 3.4。

我在数据库中有两个集合 - 我们可以调用coll2field1。这两个集合有两个共同的领域 - 比如field2& coll3和其他不同的列。

我需要创建一个新的集合(例如coll2)文档,其中包含field3的所有字段以及两个字段 - 例如field4coll1 field1 1}}用于匹配field2值和&的文档coll1coll2之间的db.coll1.insert([ { "field1": 321, "field2": "12", "Car": "camry" }, { "field1": 321, "field2": "13", "Car": "camry" }, { "field1": 323, "field2": "12", "Car": "accord" }, { "field1": 324, "field2": "15", "Car": "Sunny" } ]) db.coll2.insert([ { "field1": 321, "field2": "12", "RegNo": "1122", "State": 'AZ' }, { "field1": 321, "field2": "13", "RegNo": "1123", "State": 'AZ' }, { "field1": 323, "field2": "12", "RegNo": "1124", "State": 'CA' } ]) [ { "field1": 321, "field2": "12", "Car": "camry", "RegNo": "1122", "State": 'AZ' }, { "field1": 321, "field2": "13", "Car": "camry", "RegNo": "1123", "State": 'AZ' }, { "field1": 323, "field2": "12", "Car": "accord", "RegNo": "1124", "State": 'CA' } ]

可重复的示例

var results = db.coll2.find({}, {_id: 0}).toArray();
for( var i = 0; i < results.length; i++) {
   result = results[i];
    var doc = db.coll1.findOne({$and: [{field1: result["field1"]},
    {field2: result["field2"]}] }, {_id: 0});
    if(doc && doc["Car"]) {
        result["Car"] = doc["Car"]
        db.coll3.insertOne(result);
    }
}

必需的输出(为简洁起见,不包括ID)

coll1

所以我这样做如下

coll2

但是,使用带有500k文档的<uses-feature android:name="android.hardware.camera" android:required="true" /> <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/> <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" android:maxSdkVersion="18" /> <provider android:name="android.support.v4.content.FileProvider" android:authorities="com.example.android.fileprovider" android:exported="false" android:grantUriPermissions="true"> <meta-data android:name="android.support.FILE_PROVIDER_PATHS" android:resource="@xml/file_paths"></meta-data> </provider> 和带有50k文档的<table border="1"> <tbody> <tr> <td class="td-modal">Name:</td> <td colspan=2>Randon name</td> </tr> <tr> <td class="td-modal">Project Name:</td> <td colspan=2>XYZ</td> </tr> <tr> <td class="td-modal">Proj #:</td> <td>12345</td> <td>67890</td> </tr> </tbody> </table> 完成此操作需要很长时间(很多小时)。为什么需要这么长时间?可以做些什么来加快速度?

2 个答案:

答案 0 :(得分:1)

问题的逻辑是要求麻烦。对于从第一个集合中获取的每个记录,逻辑在第二个集合中每次都进行完整的集合扫描。

尝试

https://docs.mongodb.com/v3.2/reference/operator/aggregation/lookup/

答案 1 :(得分:1)

使用 $lookup ,您可以实现相同的逻辑但效率更高,因为您将使用MongoDB的聚合框架,该框架使用其本机运算符以更高效的方式计算聚合方式。

考虑运行以下聚合操作以获得所需的结果:

db.coll2.aggregate([
    {
        "$lookup": {
            "from": "coll1",
            "localField": "field1",
            "foreignField": "field1",
            "as": "coll1Docs"
        }
    },
    {
        "$addFields": {
            "coll1Docs": {
                "$arrayElemAt": [
                    {
                        "$filter": {
                            "input": "$coll1Docs",
                            "as": "doc",
                            "cond": {  "$eq": [ "$field2", "$$doc.field2" ]  }                                        
                        }
                    },
                    0
                ]
            }
        } 
    },      
    { "$addFields": { "Car": "$coll1Docs.Car" } }
    { "$project": { "coll1Docs": 0 } },
    { "$out": "coll3" } 
])

然后运行db.coll3.findOne()进行确认。

说明

第一个管道 $lookup ,对同一数据库中的未整理的集合执行左外连接,以过滤来自&#34;加入&#34;的文档。收集处理。

它提供了来自输入文档的字段与来自&#34;加入&#34;的文档中的字段之间的相等匹配。集合,在本例中为coll2。对于每个输入文档, $lookup 阶段会添加一个新的数组字段,其元素是来自&#34;加入&#34;的匹配文档。采集。然后将这些重新塑造的文件传递到下一阶段。

仅使用第一阶段运行管道

db.coll2.aggregate([
    {
        "$lookup": {
            "from": "coll1",
            "localField": "field1",
            "foreignField": "field1",
            "as": "coll1Docs"
        }
    }
])

将产生

/* 1 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4de"),
    "field1" : 321.0,
    "field2" : "12",
    "RegNo" : "1122",
    "State" : "AZ",
    "coll1Docs" : [ 
        {
            "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
            "field1" : 321.0,
            "field2" : "12",
            "Car" : "camry"
        }, 
        {
            "_id" : ObjectId("58b179a730c6c0da2c31a4db"),
            "field1" : 321.0,
            "field2" : "13",
            "Car" : "camry"
        }
    ]
}

/* 2 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4df"),
    "field1" : 321.0,
    "field2" : "13",
    "RegNo" : "1123",
    "State" : "AZ",
    "coll1Docs" : [ 
        {
            "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
            "field1" : 321.0,
            "field2" : "12",
            "Car" : "camry"
        }, 
        {
            "_id" : ObjectId("58b179a730c6c0da2c31a4db"),
            "field1" : 321.0,
            "field2" : "13",
            "Car" : "camry"
        }
    ]
}

/* 3 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4e0"),
    "field1" : 323.0,
    "field2" : "12",
    "RegNo" : "1124",
    "State" : "CA",
    "coll1Docs" : [ 
        {
            "_id" : ObjectId("58b179a730c6c0da2c31a4dc"),
            "field1" : 323.0,
            "field2" : "12",
            "Car" : "accord"
        }
    ]
}

第二个管道步骤 $addFields 允许您向文档添加新字段,并输出包含输入文档和新添加字段中所有现有字段的文档。

它使用其他运算符创建一个子文档字段,该字段与两个集合上的另一个字段field2相匹配。因为上面的 $lookup 运算符会生成&#34;左连接&#34;在coll1上,生成的结果数组将包含coll1中与field1匹配的所有文档。

因此,例如,field1 = 321的第二个文档包含带有元素的coll1Docs数组

"coll1Docs" : [ 
    {
        "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
        "field1" : 321.0,
        "field2" : "12",
        "Car" : "camry"
    }, 
    {
        "_id" : ObjectId("58b179a730c6c0da2c31a4db"),
        "field1" : 321.0,
        "field2" : "13",
        "Car" : "camry"
    }
]
需要首先过滤

以获得最终的展平字段

"coll1Docs" : {
    "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
    "field1" : 321.0,
    "field2" : "12",
    "Car" : "camry"
}

使用 $filter

的内部表达式
"$filter": {
    "input": "$coll1Docs",
    "as": "doc",
    "cond": {  "$eq": [ "$field2", "$$doc.field2" ]  }                                        
}

field2进行过滤并生成结果

"coll1Docs" : [ 
    {
        "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
        "field1" : 321.0,
        "field2" : "12",
        "Car" : "camry"
    }
]

并且外部表达式 $arrayElemAt 将返回索引位置0的元素,这基本上将上面的数组展平为

"coll1Docs" : {
    "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
    "field1" : 321.0,
    "field2" : "12",
    "Car" : "camry"
}

因此管道

db.coll2.aggregate([
    {
        "$lookup": {
            "from": "coll1",
            "localField": "field1",
            "foreignField": "field1",
            "as": "coll1Docs"
        }
    },
    {
        "$addFields": {
            "coll1Docs": {
                "$arrayElemAt": [
                    {
                        "$filter": {
                            "input": "$coll1Docs",
                            "as": "doc",
                            "cond": {  "$eq": [ "$field2", "$$doc.field2" ]  }                                        
                        }
                    },
                    0
                ]
            }
        } 
    }   
])

将产生

/* 1 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4de"),
    "field1" : 321.0,
    "field2" : "12",
    "RegNo" : "1122",
    "State" : "AZ",
    "coll1Docs" : {
        "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
        "field1" : 321.0,
        "field2" : "12",
        "Car" : "camry"
    }
}

/* 2 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4df"),
    "field1" : 321.0,
    "field2" : "13",
    "RegNo" : "1123",
    "State" : "AZ",
    "coll1Docs" : {
        "_id" : ObjectId("58b179a730c6c0da2c31a4db"),
        "field1" : 321.0,
        "field2" : "13",
        "Car" : "camry"
    }
}

/* 3 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4e0"),
    "field1" : 323.0,
    "field2" : "12",
    "RegNo" : "1124",
    "State" : "CA",
    "coll1Docs" : {
        "_id" : ObjectId("58b179a730c6c0da2c31a4dc"),
        "field1" : 323.0,
        "field2" : "12",
        "Car" : "accord"
    }
}

现在,使用 $addFields 的其他管道步骤可让您添加新字段Car

db.coll2.aggregate([
    {
        "$lookup": {
            "from": "coll1",
            "localField": "field1",
            "foreignField": "field1",
            "as": "coll1Docs"
        }
    },
    {
        "$addFields": {
            "coll1Docs": {
                "$arrayElemAt": [
                    {
                        "$filter": {
                            "input": "$coll1Docs",
                            "as": "doc",
                            "cond": {  "$eq": [ "$field2", "$$doc.field2" ]  }                                        
                        }
                    },
                    0
                ]
            }
        } 
    },  
    { "$addFields": { "Car": "$coll1Docs.Car" } }
])

将产生

/* 1 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4de"),
    "field1" : 321.0,
    "field2" : "12",
    "RegNo" : "1122",
    "State" : "AZ",
    "coll1Docs" : {
        "_id" : ObjectId("58b179a730c6c0da2c31a4da"),
        "field1" : 321.0,
        "field2" : "12",
        "Car" : "camry"
    },
    "Car" : "camry"
}

/* 2 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4df"),
    "field1" : 321.0,
    "field2" : "13",
    "RegNo" : "1123",
    "State" : "AZ",
    "coll1Docs" : {
        "_id" : ObjectId("58b179a730c6c0da2c31a4db"),
        "field1" : 321.0,
        "field2" : "13",
        "Car" : "camry"
    },
    "Car" : "camry"
}

/* 3 */
{
    "_id" : ObjectId("58b179a730c6c0da2c31a4e0"),
    "field1" : 323.0,
    "field2" : "12",
    "RegNo" : "1124",
    "State" : "CA",
    "coll1Docs" : {
        "_id" : ObjectId("58b179a730c6c0da2c31a4dc"),
        "field1" : 323.0,
        "field2" : "12",
        "Car" : "accord"
    },
    "Car" : "accord"
}

前面的 $project 管道步骤{&#34; $ project&#34;:{&#34; coll1Docs&#34;:0}}将删除{{1}输出中的字段:

coll1Docs

产生reesult

db.coll2.aggregate([
    {
        "$lookup": {
            "from": "coll1",
            "localField": "field1",
            "foreignField": "field1",
            "as": "coll1Docs"
        }
    },
    {
        "$addFields": {
            "coll1Docs": {
                "$arrayElemAt": [
                    {
                        "$filter": {
                            "input": "$coll1Docs",
                            "as": "doc",
                            "cond": {  "$eq": [ "$field2", "$$doc.field2" ]  }                                        
                        }
                    },
                    0
                ]
            }
        } 
    },  
    { "$addFields": { "Car": "$coll1Docs.Car" } }
    { "$project": { "coll1Docs": 0 } }          
])

最后一个管道阶段 $out 然后将上述结果写入指定的集合。