MongoDB - 返回特定文档字段

时间:2016-05-18 06:42:01

标签: mongodb

我的Mongodb文档有以下结构,正如您所见,我有3个网址,每个网址都crawled设置为TrueFalse

{
    "_id": {
        "$oid": "573b8e70e1054c00151152f7"
    },
    "domain": "example.com",
    "good": [
        {
            "crawled": true,
            "added": {
                "$date": "2016-05-17T21:34:34.485Z"
            },
            "link": "/threads/11005-Cheap-booze!"
        },
        {
            "crawled": false,
            "added": {
                "$date": "2016-05-17T21:34:34.485Z"
            },
            "link": "/threads/9445-This-week-s-voucher-codes"
        },
        {
            "crawled": false,
            "added": {
                "$date": "2016-05-17T21:34:34.485Z"
            },
            "link": "/threads/9445-This-week-s-voucher-codes_2"
        }
    ],

    "link_found": false,
    "subdomain": "http://www."
}

我尝试返回只返回crawled设置为False的网址的特定字段,为此我有以下查询:

.find({'good.crawled' : False}, {'good.link':True, 'domain':True, 'subdomain':True})

但是,返回的内容与预期的内容不同,因为它返回了所有网址,无论他们的crawled状态是True还是False < / p>

返回的内容是:

{
    u'domain': u'cashquestions.com',
    u'_id': ObjectId('573b8e70e1054c00151152f7'),
    u'subdomain': u'http://www.',
    u'good': [
         {
             u'link': u'/threads/11005-Cheap-booze!'
         },
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes'
        },
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes_2'
        } 
             ]
}

预期结果:

{
    u'domain': u'cashquestions.com',
    u'_id': ObjectId('573b8e70e1054c00151152f7'),
    u'subdomain': u'http://www.',
    u'good': [
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes'
        },
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes_2'
        } 
             ]
}

如何指定仅返回crawled设置为False的链接?

1 个答案:

答案 0 :(得分:0)

您希望使用聚合框架(这将在MongoDB 3.0及更高版本中使用):

db.yourcolleciton.aggregate([
    // optional: only those with at least one false
    {$match: {'good.crawled': false}}, 
    // get just the fields you need (plus _id)
    {$project: {good:1,  domain:1, subdomain: 1}},  
     // get each in a separate temporary document
    {$unwind: {'good': 1}},
     // limit to false
    {$match: {'good.crawled': false}}, 
    // undoes the $unwind
    {$group: {_id: "$_id", domain: {"$first": "$domain"}, 'subdomain' : {$first, '$subdomain'}, good: {"$push":"$good"}} 
])