检查深度嵌套列表中的条件

时间:2020-10-09 12:03:30

标签: python mongodb pymongo

我正在使用MongoDB和PyMongo,并具有以下数据结构。

[
    {
        "position": 367,
        "entropy": 0.1327801096975522,
        "variants_flattened": [
            "GFRHQNSEG",
            "GFRHQNSEG",
            "GFRHQNSEG",
            "GFRHQNAEG"
        ],
        "supports": 51,
        "sequences": [
            {
                "position": 367,
                "sequence": "GFRHQNSEG",
                "count": 50,
                "conservation": 98.03921568627452,
                "motif_short": "I",
                "motif_long": "Index",
                "id": [
                    "APQ31289.1",
                    "ASU55526.1",
                    "ASU55528.1",
                    "APQ31291.1"
                ],
                "strain": [
                    "Influenza A virus A/Xiamen/s200/2016",
                    "Influenza A virus A/Shandong-Zhifu/164/2016",
                    "Influenza A virus A/Shandong-Zhifu/1185/2016",
                    "Influenza A virus A/Xiamen/s228/2016"
                ],
                "country": [
                    "HA Hemagglutinin",
                    "HA Hemagglutinin",
                    "HA Hemagglutinin",
                    "HA Hemagglutinin"
                ],
                "host": [
                    "Influenza A virus A/Xiamen/s200/2016",
                    "Influenza A virus A/Shandong-Zhifu/164/2016",
                    "Influenza A virus A/Shandong-Zhifu/1185/2016",
                    "Influenza A virus A/Xiamen/s228/2016"
                ]
            },
            {
                "position": 367,
                "sequence": "GFRHQNAEG",
                "count": 1,
                "conservation": 1.9607843137254902,
                "motif_short": "Ma",
                "motif_long": "Major",
                "id": [
                    "QBM69728.1"
                ],
                "strain": [
                    "Influenza A virus A/China/70793/2016"
                ],
                "country": [
                    "HA Hemagglutinin"
                ],
                "host": [
                    "Influenza A virus A/China/70793/2016"
                ]
            }
        ],
        "variants": 2
    }
]

根级别列表包含结构相似的多个对象。

我需要获取的实例(仅“序列”列表中的特定对象)的“ motif_short”等于“ I”。

预期的输出是(在此特定示例中,只有一个输出对象,但是在单个实例中可以有多个符合此条件的对象):

{
                "position": 367,
                "sequence": "GFRHQNSEG",
                "count": 50,
                "conservation": 98.03921568627452,
                "motif_short": "I",
                "motif_long": "Index",
                "id": [
                    "APQ31289.1",
                    "ASU55526.1",
                    "ASU55528.1",
                    "APQ31291.1"
                ],
                "strain": [
                    "Influenza A virus A/Xiamen/s200/2016",
                    "Influenza A virus A/Shandong-Zhifu/164/2016",
                    "Influenza A virus A/Shandong-Zhifu/1185/2016",
                    "Influenza A virus A/Xiamen/s228/2016"
                ],
                "country": [
                    "HA Hemagglutinin",
                    "HA Hemagglutinin",
                    "HA Hemagglutinin",
                    "HA Hemagglutinin"
                ],
                "host": [
                    "Influenza A virus A/Xiamen/s200/2016",
                    "Influenza A virus A/Shandong-Zhifu/164/2016",
                    "Influenza A virus A/Shandong-Zhifu/1185/2016",
                    "Influenza A virus A/Xiamen/s228/2016"
                ]
}

我对MongoDB并不陌生,并且尝试了诸如Aggregate之类的一些选择,但是我是从哪里开始的。请帮帮我。

谢谢!

1 个答案:

答案 0 :(得分:0)

您可以使用汇总$project$filter解决此问题。对于这个特定问题,请尝试以下脚本:

#if col is our collection object in pymongo

result = col.aggregate([{'$project': {'sequences': { '$filter': { 'input': '$sequences', 'as': 's', 'cond': { '$eq': ['$$s.motif_short', 'I'] } } } }}])

此查询针对序列进行投影,并且对motif_short等于“ I”的过滤器进行过滤。结果,您将得到如下内容:

{
  "_id":"xyz",
  "sequences":[
    {
      "position":367,
      "sequence":"GFRHQNSEG",
      "count":50,
      "conservation":98.03921568627452,
      "motif_short":"I",
      "motif_long":"Index",
      "id":[
        "APQ31289.1",
        "ASU55526.1",
        "ASU55528.1",
        "APQ31291.1"
      ],
      "strain":[
        "Influenza A virus A/Xiamen/s200/2016",
        "Influenza A virus A/Shandong-Zhifu/164/2016",
        "Influenza A virus A/Shandong-Zhifu/1185/2016",
        "Influenza A virus A/Xiamen/s228/2016"
      ],
      "country":[
        "HA Hemagglutinin",
        "HA Hemagglutinin",
        "HA Hemagglutinin",
        "HA Hemagglutinin"
      ],
      "host":[
        "Influenza A virus A/Xiamen/s200/2016",
        "Influenza A virus A/Shandong-Zhifu/164/2016",
        "Influenza A virus A/Shandong-Zhifu/1185/2016",
        "Influenza A virus A/Xiamen/s228/2016"
      ]
    }
  ]
}