pymongo aggregate - 每个字段的返回计数

时间:2017-02-01 22:23:23

标签: mongodb pymongo

我正在尝试使用mongodb进行分组,我正在使用pymongo接口。

我的数据示例:

{
    "transmitters": [], 
    "receptors": [], 
    "text": "\"We have developed CellExcite, a sophisticated simulation environment for excitable-cell networks. CellExcite allows the user to sketch a tissue of excitable cells, plan the stimuli to be applied during simulation, and customize the diffusion model. CellExcite adopts Hybrid Automata (HA) as the computational model in order to efficiently capture both discrete and continuous excitable-cell behavior.\"", 
    "genes": [], 
    "simenvironment": [
        "CellExcite (web link to model)"
    ], 
    "channels": [], 
    "references": [
        112450
    ], 
    "modelconcepts": [
        "Spatio-temporal Activity Patterns", 
        "Simplified Models"
    ], 
    "celltypes": [
        "Heart cell", 
        "Squid axon"
    ], 
    "title": "CellExcite: an efficient simulation environment for excitable cells (Bartocci et al. 2008)", 
    "modeltype": [
        "Neuron or other electrically excitable cell"
    ], 
    "brainregions": [], 
    "_id": 112468
}, 

我想获得每种细胞类型的模型数量。如图所示,模型可以具有每个模型的多个细胞类型。我怎么能这样做?

这是我的尝试:

pipeline = [{'$group' : {'_id' : '$celltypes', 'num_models' : {'$sum' : 1}}}, 
             {'$project': {'celltypes':1, 'num_models':1}}]
for doc in (models.aggregate(pipeline)):
    pprint (doc)
    break

以下是我的结果:

{u'_id': [u'Heart cell'], u'num_models': 6}
...snip...
{u'_id': [u'Heart cell', u'Squid axon'], u'num_models': 1}

我为输出道歉,我有更多的模型,它实际上是打印所有这些。

任何人都可以给我一个暗示我可能出错的地方吗?我想要的只是一个单元格类型列表和它们所在的模型数量。

1 个答案:

答案 0 :(得分:2)

你几乎就在那里,所有你需要做的就是$unwind这种类型,因为它是一个数组,所以你可以分别按每个值分组:

pipeline = [
    {'$unwind': '$celltypes'},
    {'$group' : {
        '_id' : '$celltypes', 
        'num_models' : {'$sum' : 1}}
    }, 
    {'$project': {
        'celltypes':1,
        'num_models':1}
    }
]