我正在尝试评估要用于新项目的数据库系统。
目前我将MySQL和MongoDB进行比较,以完成手头的任务。
我有500个数字字段的5百万条记录,我必须使用这些数据为某些图形绘图提供不同的粒度级别。
我将数据泵入MongoDB并进入Mysql,在Mysql上我生成了一些具有10 / th,100 / th和1000 / th粒度的临时表。然后,应用程序选择与当前任务最匹配的正确表,然后在那里查询数据。
使用这种技术,我可以足够快地获得数据(<100毫秒)。 我使用的SQL查询是:
SELECT from_unixtime(CAST(FLOOR(MIN(STAMP/1000)) AS SIGNED INTEGER)),
MIN(RING),MIN(STATE),CAST(FLOOR(MIN(STAMP)) as SIGNED INTEGER),AVG(w21030401)
FROM project1 GROUP BY FLOOR((stamp - 1181589892000)/60000);
我使用相同的查询来创建临时表。唯一的区别是,有350个wXXXXXX字段。
INSERT INTO project1_10 (TTIME,RING,STATE,STAMP,w21030401,.........)
SELECT from_unixtime(CAST(FLOOR(MIN(STAMP/1000)) AS SIGNED INTEGER)),
MIN(RING),MIN(STATE),CAST(FLOOR(MIN(STAMP)) as SIGNED INTEGER),AVG(w21030401),.......
FROM project1 GROUP BY FLOOR((stamp - 1181589892000)/60000);
然后我尝试用MongoDB做同样的事情。 我将所有数据整合到MongoDB中,并在表单中获得了4,800万个文档:
{ "_id" : ObjectId("50040b3f0cf2872a8d3af90d"), "TTIME" :
ISODate("2008-11-30T06:40:07Z"), "STAMP" : NumberLong("1228027207000"),
"STATE" : 2531, "RING" : 1, "w13010096" : 34.991, "w13010097" : 1.432,
"w23010001" : 292, "w18030180" : 84, "w18030380" : 95, "w21030002" : 51.113,
"w21030005" : 60.321, "w21030004" : 274.662, "w21030008" : 149.629,
"w21030009" : 126.565, "w21030010" : 576.296, ........... }
然后我尝试使用以下mapReduce生成临时文档:
keylist = [ 'w21030401', 'w13011114', .... ];
m = function (){
var result = {};
result['STAMP'] = this['STAMP'];
result['RING'] = this['RING'];
result['TTIME'] = this['TTIME'];
result['STATE'] = this['STATE'];
for(var key in keylist){
if(key in this) {
result[key] = this[key];
result['cnt_' + key] = 1;
}
}
var zone = Math.floor((this['STAMP'] - 1171004118000) / 1000000);
emit( zone , result );
};
r = function (name, values){
var result = {};
result['STAMP'] = values[0]['STAMP'];
result['RING'] = values[0]['RING'];
result['TTIME'] = values[0]['TTIME'];
result['STATE'] = values[0]['STATE'];
for(var key in keylist) {
result[key] = 0;
result['cnt_' + key] = 0;
}
for ( var i=0; i<values.length; i++ ) {
if(values[i]['STAMP'] < result['STAMP']) {
result['STAMP'] = values[i]['STAMP'];
result['TTIME'] = values[i]['TTIME'];
}
if(values[i]['RING'] < result['RING']) {
result['RING'] = values[i]['RING'];
}
if(values[i]['STATE'] < result['STATE']) {
result['STATE'] = values[i]['STATE'];
}
for(var key in keylist) {
if(key in values[i]) {
result[key] += values[i][key];
result['cnt_' + key] += values[i]['cnt_' + key];
}
}
}
return result;
};
f = function(who, val){
var result = {};
result['STAMP'] = val['STAMP'];
result['RING'] = val['RING'];
result['TTIME'] = val['TTIME'];
result['STATE'] = val['STATE'];
for(var key in keylist) {
if(key in val) {
result[key] = val[key]/val['cnt_'+key];
}
}
return result;
};
db.project1.mapReduce( m, r, { finalize : f, scope: { keylist: keylist }, out : {replace : 'project1_100'} , jsMode : false });
MySQL使用210秒来创建临时表,MongoDB使用了大约4个小时。
我的问题是: MongoDB不适合我的问题,我是否需要更大的硬件用于MongoDB而不是MySQL,或者我做错了什么我的MapReduce
由于
彼得