我想知道从MongoDB中存储的大型GeoJSON集合(大约80k行)中删除重复文档的最佳方法。我相信重复项会导致前端出错,因为我无法将完整集合记录到控制台。
我试图在mongo shell中使用dropDups方法,如下面的url所述,但没有成功.. MongoDB query to remove duplicate documents from a collection。另外我认为dropDups从MongoDB 2.6开始折旧
以下是我的架构结构示例:
{
"type": "FeatureCollection",
"features": [
{
"geometry": {
"type": "Point","coordinates": [-73.994720, 40.686902]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.994720, 40.686902]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.989205, 40.686675]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.994655, 40.687391]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.985557, 40.687683]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.985557, 40.687683]
}
},
{
"geometry": {
"type": "Point","coordinates": [-73.984656, 40.685462]
}
},
]
}
这是mongo shell中的创建索引尝试,重复项仍然存在!
> db.testschema.createIndex( { coordinates: 1 }, { unique: true, dropdups: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.testschema.createIndex( { geometry: 1 }, { unique: true, dropdups: true } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
> db.testschema.ensureIndex({'testschema.features.geometry.coordinates': 1}, {unique: true, dropdups: true})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}