我每天都有数十亿份文件存放到百货商店(男人,女人等)
id_department:部门的位置,area_type:部门的分支名称(如鞋子,时装等)
(_id:59e86325dc03580bdbf2347f
date:20170906
id_department:2640
goinside_type:2
area_type:1)
(_id:59e86325dc03580bdbf2347f
date:20170906
id_department:2642
goinside_type:3
area_type:2)
我想写一个查询可以返回一个人在一个时间范围内访问area_type的问题,这里的问题是area_type可以超过1000并且每个area_type的条件可以不同(所以在这种情况下不能使用group bytype_type )。我的管道很长,会降低性能。
$pipeline = Array
(
[0] => Array
(
[$match] => Array
(
[id_station] => Array
(
[$in] => Array
(
[0] => 2640
[1] => 2642
[2] => 2644
)
)
[date] => Array
(
[$gte] => 20170802
[$lte] => 20170930
)
)
)
[1] => Array
(
[$group] => Array
(
[_id] => Array
(
[id_station] => $id_station
)
[total_entries - area1] => Array
(
[$sum] => Array
(
[$cond] => Array
(
[0] => Array
(
[$and] => Array
(
[0] => Array
(
[$eq] => Array
(
[0] => $area_type
[1] => 1
)
)
[2] => Array
(
[$gte] => Array
(
[0] => $date
[1] => 20170901
)
)
[3] => Array
(
[$lte] => Array
(
[0] => $date
[1] => 20170930
)
)
)
)
[1] => 1
[2] => 0
)
)
)
[total_entries - area1previous] => Array
(
[$sum] => Array
(
[$cond] => Array
(
[0] => Array
(
[$and] => Array
(
[0] => Array
(
[$eq] => Array
(
[0] => $area_type
[1] => 1
)
)
[2] => Array
(
[$gte] => Array
(
[0] => $date
[1] => 20170802
)
)
[3] => Array
(
[$lte] => Array
(
[0] => $date
[1] => 20170831
)
)
)
)
[1] => 1
[2] => 0
)
)
)
[total_entries - area2] => Array
(
[$sum] => Array
(
[$cond] => Array
(
[0] => Array
(
[$and] => Array
(
[0] => Array
(
[$eq] => Array
(
[0] => $area_type
[1] => 2
)
)
[2] => Array
(
[$gte] => Array
(
[0] => $date
[1] => 20170901
)
)
[3] => Array
(
[$lte] => Array
(
[0] => $date
[1] => 20170930
)
)
)
)
[1] => 1
[2] => 0
)
)
)
[total_entries - area2previous] => Array
(
[$sum] => Array
(
[$cond] => Array
(
[0] => Array
(
[$and] => Array
(
[0] => Array
(
[$eq] => Array
(
[0] => $area_type
[1] => 2
)
)
[2] => Array
(
[$gte] => Array
(
[0] => $date
[1] => 20170802
)
)
[3] => Array
(
[$lte] => Array
(
[0] => $date
[1] => 20170831
)
)
)
)
[1] => 1
[2] => 0
)
)
)
)
)
)
$cursor = $collection->aggregate($pipeline, ['allowDiskUse' => true]);
有什么想法可以解决这个问题吗?
答案 0 :(得分:0)
这里最重要的是你在date
和id_department
/ id_station
(我怀疑是相同的)字段上创建索引。
collection.createIndex({
"id_department" : 1,
"date" : 1
})
这将加快你的$match
阶段,之后只剩下几个文件来处理以下管道阶段。
一旦衡量了最终的效果并证明不够,您就可以尝试优化查询(例如,通过将重复日期过滤器提取到真正的临时$project
或$group
阶段你已经分组了。)