如何使用变量名查询子文档

时间:2018-07-03 20:51:40

标签: python mongodb pymongo

我有一个导入到mongodb中的json文档,该文档看起来类似于以下测试数据:

[
    {
        "subject_id": "1",
        "name": "Bob",
        "dob": "12/31/00",
        "gender": "Male",
        "visits": {
            "12/31/15": {
                "age": "17",
                "visit_category": "Baseline Visit"
            },
            "12/31/16": {
                "age": "18",
                "visit_category": "Follow Up Visit"
            },
            "12/31/17": {
                "age": "18",
                "visit_category": "Follow Up Visit"
            }
        },
        "samples": {
            "XXX123": {
                "completed_by": "Sally",
                "label_on_sample": "1"
            }
        }
    },
    {
        "subject_id": "2",
        "name": null,
        "dob": "1/1/01",
        "gender": "Female",
        "visits": {
            "1/1/11": {
                "age": "10",
                "visit_category": "Baseline Visit"
            },
            "1/1/12": {
                "age": "11",
                "visit_category": "Follow Up Visit"
            },
            "1/1/13": {
                "age": "12",
                "visit_category": "Follow Up Visit"
            },
            "1/1/14": {
                "age": "13",
                "visit_category": "Follow Up Visit"
            },
            "1/1/15": {
                "age": "14",
                "visit_category": "Follow Up Visit"
            }
        },
        "samples": {
            "YYY456": {
                "completed_by": null,
                "label_on_sample": "2"
            },
            "ZZZ789": {
                "completed_by": "Sally",
                "label_on_sample": "2"
            }
        }
    }
]

我想在访问日期或样品中查询信息,但我相信由于标题可变,我感到很困惑。查询所有子文档的最佳方法是什么。

filter_by = {'subject.samples': {'$elemMatch': {'visit_category': "Follow Up Visit" }}}
data = db['subject'].find(filter_by)
print(data.count())

返回0。如何在'subject.samples'之后格式化某种通配符才能使它起作用。

谢谢。

1 个答案:

答案 0 :(得分:1)

首先,您可能需要更正文档结构,以使访问键包含一个访问数组

Mongo允许一个人做pipeline query that converts an object to an array,但我认为如果不考虑其他优化搜索的方法,这对于大型馆藏就很难轻易扩展。

现在,我将在这里查询与“后续访问”匹配的访问总数

pipeline =  [
      {
         '$project': {
            'visits': { '$objectToArray': '$visits' }
         }
      },
     {
         '$unwind': '$visits'
     },
     {
        '$match': {
            'visits.v.visit_category': 'Follow Up Visit'
        }
     },
     {
        '$count': 'count'
     }
]
cur = db.patient.aggregate(pipeline)
result = next(cur)

print(result['count'])