我有一个数据集(例子):
{u'geometry': {u'type': u'Point', u'coordinates': [151.5162, -9.44365]}, u'_id': ObjectId('5ad70f71f2119236741ffb39'), u'type': u'Feature', u'properties': {u'POS_ID': u'592795', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:52:00.000', u'MMSI': u'636015725'}}
{u'geometry': {u'type': u'Point', u'coordinates': [119.0369, -0.3608933]}, u'_id': ObjectId('5ad70f71f2119236741ffb0d'), u'type': u'Feature', u'properties': {u'POS_ID': u'592557', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:49:00.000', u'MMSI': u'636092156'}}
{u'geometry': {u'type': u'Point', u'coordinates': [158.1707, -0.9142034]}, u'_id': ObjectId('5ad85e210b2d50e1174f5d29'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}
{u'geometry': {u'type': u'Point', u'coordinates': [158.2707, -0.8142034]}, u'_id': ObjectId('5ad85e2c0b2d50e1174f5d2a'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000',u'MMSI': u'503551000'}}
{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c05b66f42caf578c45'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000' }}
{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c45b66f42caf578c46'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}
我想获取2个随机记录,但保留具有相同MMSI的记录。更具体地说,正如您所看到的,最后四条记录具有相同的MMSI。如果我想获取2个我想要返回的随机记录:
{u'geometry': {u'type': u'Point', u'coordinates': [119.0369, -0.3608933]}, u'_id': ObjectId('5ad70f71f2119236741ffb0d'), u'type': u'Feature', u'properties': {u'POS_ID': u'592557', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:49:00.000', u'MMSI': u'636092156'}}
{u'geometry': {u'type': u'Point', u'coordinates': [158.1707, -0.9142034]}, u'_id': ObjectId('5ad85e210b2d50e1174f5d29'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}
{u'geometry': {u'type': u'Point', u'coordinates': [158.2707, -0.8142034]}, u'_id': ObjectId('5ad85e2c0b2d50e1174f5d2a'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000',MMSI': u'503551000'}}
{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c05b66f42caf578c45'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000' }}
{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c45b66f42caf578c46'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}
第一个MMSI = 636092156,第二个MMSI = 503551000(4个记录)。
在SQL中我想要类似的东西:
select from table where MMSI in (select distinct(MMSI) from table limit 2));
到目前为止,我有查询:
getlimitShips = db.samplecol.aggregate([{"$lookup":{"from":"samplecol", "localField":"properties.MMSI", "foreignField":"properties.MMSI", "as":"ff"}},{ "$limit" : 97},{ "$project": {"_id":0, "ff.properties.POS_ID":0,"ff.properties.STATUS":0, "ff.properties.TIMESTAMP":0])
count_lim = 0
for limS in getlimitShips:
print "SHIP:", limS["properties"]["MMSI"],"\n"
count_lim = count_lim +1
print "Record",count_lim,": ", limS,"\n"
返回:
...
...
SHIP:503551000
记录97:{u'geometry':{u'type':u'Point',u'coordinates': [157.1707,-0.9142034]},u'type':u'Feature',u'properties': {u'POS_ID':u'132856',u'STATUS':u'10',u'MMSI':u'503551000', u'COURSE':u'12',u'TIMESTAMP':u'2013-12-31 17:04:00.000'}, u'ff':[{u'geometry':{u'coordinates':[141.8705,-12.67311]}, u'properties':{u'MMSI':u'503551000'}},{u'geometry': {u'coordinates':[158.1707,-0.9142034]},u'properties':{u'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[158.2707, -0.8142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[157.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[157.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}}]}
SHIP:503551000
记录104:{u'geometry':{u'type':u'Point',u'coordinates': [157.1707,-0.9142034]},u'type':u'Feature',u'properties': {u'POS_ID':u'132856',u'STATUS':u'10',u'MMSI':u'503551000', u'COURSE':u'10',u'TIMESTAMP':u'2013-12-31 17:04:00.000'},u'ff': [{u'geometry':{u'coordinates':[141.8705,-12.67311]},u'properties': {u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.2707,-0.8142034]},u'properties':{u 'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[157.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[157.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI': u'503551000'}}]}
SHIP:503551000
记录105:{u'geometry':{u'type':u'Point',u'coordinates': [157.1707,-0.9142034]},u'type':u'Feature',u'properties': {u'POS_ID':u'132856',u'STATUS':u'10',u'MMSI':u'503551000', u'COURSE':u'4',u'TIMESTAMP':u'2013-12-31 17:04:00.000'},u'ff': [{u'geometry':{u'coordinates':[141.8705,-12.67311]},u'properties': {u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[158.2707,-0.8142034]},u'properties':{u 'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[157.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI': u'503551000'}},{u'geometry':{u'coordinates':[157.1707, -0.9142034]},u'properties':{u'MMSI':u'503551000'}},{u'geometry':{u'coordinates':[157.1707,-0.9142034]},u'properties':{u 'MMSI': u'503551000' }}]}
正如您所看到的,查询返回与聚合结果相同的船舶,其次数与记录在mongo中的次数相同。有人如何删除查询中的重复项并返回聚合结果一次?