我有一个包含分析数据的集合(目前有超过1000万份文档)。
来自clicks
集合的文档示例:
{
serverId: 'srv1',
dateVisited: '2014-12-24',
campaignId: 'c1',
...
landingpageClicks: [
{...},
{...}
],
offerTrackings: [
{
amount: 10
},
{
amount: 22
}
{
amount: 18
}
]
}
我需要从此系列中提取报告。用户可以通过多个字段请求分组,例如按日期分组,然后按serverId
分组,然后按campaignId
分组,报告应如下所示:
2014-12-24 | 50 lp clicks | 21 offer clicks | $600 // srv1 + srv2
srv1 | 20 lp clicks | 11 offer clicks | $400 // campaign1 + campaign2
campaign1 | 10 lp clicks | 6 offer clicks | $100
campaign2 | 10 lp clicks | 5 offer clicks | $300
srv2 | 30 lp clicks | 10 offer clicks | $200 // campaign3 + campaign4
campaign3 | 20 lp clicks | 4 offer clicks | $100
campaign4 | 10 lp clicks | 6 offer clicks | $100
目前我使用以下查询来提取报告,但速度极慢:
db.clicks.aggregate([
{$match: {'_id.dateVisited': '2014-12-24'}},
{$group:{
_id: '$_id.dateVisited',
totalLandingpageClicksCount: {$sum: '$value.landingpageClicksCount'},
totalOfferTrackingsCount: {$sum: '$value.offerTrackingsCount'},
totalOfferTrackingsAmount: {$sum: '$value.offerTrackingsAmount'}
}}
])
我的想法是为每个可能的字段组合创建单独的集合,并使用find({<search + group fields>})
而不是聚合。即如果用户请求报告特定日期间隔,按serverId
分组,然后按campaignId
分组,则将使用以下查询:
//example of doc in dateVisited_serverId collection
{
_id: {
dateVisited: '...',
serverId: '..'
},
value: {
<counts>
}
}
// get stats for date, grouped by serverId
db.dateVisited_serverId.find({
'_id.dateVisited': {
'$gte': dateFrom,
'$lte': dateTo
}
})
//example of doc in dateVisited_serverId_campaignId collection
{
_id: {
dateVisited: '...',
serverId: '..',
campaignId: '..'
},
value: {
<counts>
}
}
// get stats for date, grouped by serverId and then by campaignId
db.dateVisited_serverId_campaignId.find({
dateVisited: {
'$gte': dateFrom,
'$lte': dateTo
},
serverId: {$in: [<server ids from previous query>]}
})
它会起作用,但clicks
集合有18个字段,所以我必须生成 245760 集合来实现我的想法。
这样我需要为我的数据库找到另一种设计。
[更新]真实文档示例:
{
"_id": {
"dateVisited": ISODate("2014-11-05T00:00:00.0Z"),
"campaignId": "4c29dc888be98a9488e6876133852c72",
"landingpageId": "c5557aedab04ad1444b0ee28b5ddaab9",
"offerId": null,
"trafficAccountId": "84d06369b9872e9a2685483b7a532a10",
"serverId": "32",
"browser": "Safari",
"platform": "Android",
"c1": "chat",
"c2": "au",
"c3": "12b-ad1a",
"c4": "mtv2",
"city": "Perth",
"country": "Australia",
"deviceType": "mobile",
"isp": "Telstra Internet",
"netspeedId": NumberLong(3),
"set": ""
},
"value": {
"lpCount": 2,
"offersCount": 0,
"grandConversionCount": 0,
"grandConversionAmount": 0
}
}
答案 0 :(得分:0)
你没有提到你是否有任何索引。如果您不这样做,请在dateVisited
上构建并编制索引并再次检查您的表现。
根据您网站的流量,您可能会在正常使用期间删除索引,因为它会使插入更慢并在您需要创建报告时构建它。