我正在使用python和mongodb,现在我需要从数据库中查询文档并从文档中保存一些信息,现在我的代码是:
for trips in trip.find({},{'latlng_start':1, 'latlng_end':1, 'trip_data':1, 'trip_id':1}).batch_size(500):
orig_coord = trips['latlng_start']['coordinates']
dest_coord = trips['latlng_end']['coordinates']
cell_start = citymap.find({"trips_orig": {"$exists": True},"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":orig_coord}}}})
cell_end = citymap.find({"trips_dest": {"$exists": True},"cell_latlng":{"$geoIntersects":{"$geometry":{"type":"Point", "coordinates":dest_coord}}}})
if cell_start.count() == 1 and cell_end.count() == 1 and cell_start[0]['big_cell8']['POI'] != {} and cell_end[0]['big_cell8']['POI'] != {}:
try:
labels_raw.append(purpose_mapping[trips['trip_data']['purpose']])
user_ids_raw.append(int(trips['trip_id'][:10]))
venue_feature_start.append([cell_start[0]['big_cell8']['POI'], orig_coord])
venue_feature_end.append([cell_end[0]['big_cell8']['POI'], dest_coord])
except:
continue
else:
continue
我已将2dsphere索引分配给集合citymap,此集合的索引为:
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "CitySeg2014.grid750"
},
{
"v" : 1,
"key" : {
"latlng" : "2dsphere"
},
"name" : "latlng_2dsphere",
"ns" : "CitySeg2014.grid750",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"cell_latlng" : "2dsphere"
},
"name" : "cell_latlng_2dsphere",
"ns" : "CitySeg2014.grid750",
"2dsphereIndexVersion" : 2
},
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "trips_dest_text_trips_orig_text",
"ns" : "CitySeg2014.grid750",
"weights" : {
"trips_dest" : 1,
"trips_orig" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 2
}
]
问题在于,虽然只有47000个行程,而城市地图只包含11600个文档,但查询大约需要3000秒!但今天早上当我运行相同的程序时,大约需要800秒。我不知道为什么会这样。有什么想法提高效率吗?