使用Mongo

时间:2018-04-20 14:22:27

标签: python json mongodb aggregate pymongo

我有一个数据集(例子):

{u'geometry': {u'type': u'Point', u'coordinates': [151.5162, -9.44365]}, u'_id': ObjectId('5ad70f71f2119236741ffb39'), u'type': u'Feature', u'properties': {u'POS_ID': u'592795', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:52:00.000', u'MMSI': u'636015725'}}

{u'geometry': {u'type': u'Point', u'coordinates': [119.0369, -0.3608933]}, u'_id': ObjectId('5ad70f71f2119236741ffb0d'), u'type': u'Feature', u'properties': {u'POS_ID': u'592557', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:49:00.000', u'MMSI': u'636092156'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.1707, -0.9142034]}, u'_id': ObjectId('5ad85e210b2d50e1174f5d29'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15',   u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.2707, -0.8142034]}, u'_id': ObjectId('5ad85e2c0b2d50e1174f5d2a'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000',u'MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c05b66f42caf578c45'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000' }} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c45b66f42caf578c46'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10',  u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}

我想获取2个随机记录,但保留具有相同MMSI的记录。更具体地说,正如您所看到的,最后四条记录具有相同的MMSI。如果我想获取2个我想要返回的随机记录:

{u'geometry': {u'type': u'Point', u'coordinates': [119.0369, -0.3608933]}, u'_id': ObjectId('5ad70f71f2119236741ffb0d'), u'type': u'Feature', u'properties': {u'POS_ID': u'592557', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:49:00.000', u'MMSI': u'636092156'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.1707, -0.9142034]}, u'_id': ObjectId('5ad85e210b2d50e1174f5d29'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15',   u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.2707, -0.8142034]}, u'_id': ObjectId('5ad85e2c0b2d50e1174f5d2a'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000',MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c05b66f42caf578c45'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000' }} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c45b66f42caf578c46'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10',  u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}

第一个MMSI = 636092156,第二个MMSI = 503551000(4个记录)。

在SQL中我想要类似的东西:

select from table where MMSI in (select distinct(MMSI) from table limit 2));

到目前为止,我有查询:

getlimitShips = db.samplecol.aggregate([{"$lookup":{"from":"samplecol", "localField":"properties.MMSI", "foreignField":"properties.MMSI", "as":"ff"}},{ "$limit" : 97},{ "$project": {"_id":0, "ff.properties.POS_ID":0,"ff.properties.STATUS":0, "ff.properties.TIMESTAMP":0])

count_lim = 0
for limS in getlimitShips:
    print "SHIP:", limS["properties"]["MMSI"],"\n"
    count_lim = count_lim +1
    print "Record",count_lim,": ", limS,"\n"

返回:

  

...

     

...

     

SHIP:503551000

     

记录97:{u'geometry':{u'type':u'Point',u'coordinates':   [157.1707,-0.9142034]},u'type':u'Feature',u'properties':   {u'POS_ID':u'132856',u'STATUS':u'10',u'MMSI':u'503551000',   u'COURSE':u'12',u'TIMESTAMP':u'2013-12-31 17:04:00.000'},   u'ff':[{u'geometry':{u'coordinates':[141.8705,-12.67311]},   u'properties':{u'MMSI':u'503551000'}},{u'geometry':   {u'coordinates':[158.1707,-0.9142034]},u'properties':{u'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[158.2707,   -0.8142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[157.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[157.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}}]}

     

SHIP:503551000

     

记录104:{u'geometry':{u'type':u'Point',u'coordinates':   [157.1707,-0.9142034]},u'type':u'Feature',u'properties':   {u'POS_ID':u'132856',u'STATUS':u'10',u'MMSI':u'503551000',   u'COURSE':u'10',u'TIMESTAMP':u'2013-12-31 17:04:00.000'},u'ff':   [{u'geometry':{u'coordinates':[141.8705,-12.67311]},u'properties':   {u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.2707,-0.8142034]},u'properties':{u 'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[157.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[157.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI':   u'503551000'}}]}

     

SHIP:503551000

     

记录105:{u'geometry':{u'type':u'Point',u'coordinates':   [157.1707,-0.9142034]},u'type':u'Feature',u'properties':   {u'POS_ID':u'132856',u'STATUS':u'10',u'MMSI':u'503551000',   u'COURSE':u'4',u'TIMESTAMP':u'2013-12-31 17:04:00.000'},u'ff':   [{u'geometry':{u'coordinates':[141.8705,-12.67311]},u'properties':   {u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.2707,-0.8142034]},u'properties':{u 'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[157.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI':   u'503551000'}},{u'geometry':{u'coordinates':[157.1707,   -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI':   u'503551000' }}]}

正如您所看到的,查询返回与聚合结果相同的船舶,其次数与记录在mongo中的次数相同。有人如何删除查询中的重复项并返回聚合结果一次?

0 个答案:

没有答案