我正在使用MongoDB和PyMongo,并具有以下数据结构。
[
{
"position": 367,
"entropy": 0.1327801096975522,
"variants_flattened": [
"GFRHQNSEG",
"GFRHQNSEG",
"GFRHQNSEG",
"GFRHQNAEG"
],
"supports": 51,
"sequences": [
{
"position": 367,
"sequence": "GFRHQNSEG",
"count": 50,
"conservation": 98.03921568627452,
"motif_short": "I",
"motif_long": "Index",
"id": [
"APQ31289.1",
"ASU55526.1",
"ASU55528.1",
"APQ31291.1"
],
"strain": [
"Influenza A virus A/Xiamen/s200/2016",
"Influenza A virus A/Shandong-Zhifu/164/2016",
"Influenza A virus A/Shandong-Zhifu/1185/2016",
"Influenza A virus A/Xiamen/s228/2016"
],
"country": [
"HA Hemagglutinin",
"HA Hemagglutinin",
"HA Hemagglutinin",
"HA Hemagglutinin"
],
"host": [
"Influenza A virus A/Xiamen/s200/2016",
"Influenza A virus A/Shandong-Zhifu/164/2016",
"Influenza A virus A/Shandong-Zhifu/1185/2016",
"Influenza A virus A/Xiamen/s228/2016"
]
},
{
"position": 367,
"sequence": "GFRHQNAEG",
"count": 1,
"conservation": 1.9607843137254902,
"motif_short": "Ma",
"motif_long": "Major",
"id": [
"QBM69728.1"
],
"strain": [
"Influenza A virus A/China/70793/2016"
],
"country": [
"HA Hemagglutinin"
],
"host": [
"Influenza A virus A/China/70793/2016"
]
}
],
"variants": 2
}
]
根级别列表包含结构相似的多个对象。
我需要获取的实例(仅“序列”列表中的特定对象)的“ motif_short”等于“ I”。
预期的输出是(在此特定示例中,只有一个输出对象,但是在单个实例中可以有多个符合此条件的对象):
{
"position": 367,
"sequence": "GFRHQNSEG",
"count": 50,
"conservation": 98.03921568627452,
"motif_short": "I",
"motif_long": "Index",
"id": [
"APQ31289.1",
"ASU55526.1",
"ASU55528.1",
"APQ31291.1"
],
"strain": [
"Influenza A virus A/Xiamen/s200/2016",
"Influenza A virus A/Shandong-Zhifu/164/2016",
"Influenza A virus A/Shandong-Zhifu/1185/2016",
"Influenza A virus A/Xiamen/s228/2016"
],
"country": [
"HA Hemagglutinin",
"HA Hemagglutinin",
"HA Hemagglutinin",
"HA Hemagglutinin"
],
"host": [
"Influenza A virus A/Xiamen/s200/2016",
"Influenza A virus A/Shandong-Zhifu/164/2016",
"Influenza A virus A/Shandong-Zhifu/1185/2016",
"Influenza A virus A/Xiamen/s228/2016"
]
}
我对MongoDB并不陌生,并且尝试了诸如Aggregate之类的一些选择,但是我是从哪里开始的。请帮帮我。
谢谢!
答案 0 :(得分:0)
您可以使用汇总$project
和$filter
解决此问题。对于这个特定问题,请尝试以下脚本:
#if col is our collection object in pymongo
result = col.aggregate([{'$project': {'sequences': { '$filter': { 'input': '$sequences', 'as': 's', 'cond': { '$eq': ['$$s.motif_short', 'I'] } } } }}])
此查询针对序列进行投影,并且对motif_short等于“ I”的过滤器进行过滤。结果,您将得到如下内容:
{
"_id":"xyz",
"sequences":[
{
"position":367,
"sequence":"GFRHQNSEG",
"count":50,
"conservation":98.03921568627452,
"motif_short":"I",
"motif_long":"Index",
"id":[
"APQ31289.1",
"ASU55526.1",
"ASU55528.1",
"APQ31291.1"
],
"strain":[
"Influenza A virus A/Xiamen/s200/2016",
"Influenza A virus A/Shandong-Zhifu/164/2016",
"Influenza A virus A/Shandong-Zhifu/1185/2016",
"Influenza A virus A/Xiamen/s228/2016"
],
"country":[
"HA Hemagglutinin",
"HA Hemagglutinin",
"HA Hemagglutinin",
"HA Hemagglutinin"
],
"host":[
"Influenza A virus A/Xiamen/s200/2016",
"Influenza A virus A/Shandong-Zhifu/164/2016",
"Influenza A virus A/Shandong-Zhifu/1185/2016",
"Influenza A virus A/Xiamen/s228/2016"
]
}
]
}