我正在寻找一种非常快速的数组搜索解决方法。
我真正需要的是:针对黑名单将一系列电子邮件作为csv字符串进行检查。
我的解决方案,针对每封电子邮件:
使用blacklist.indexOf(email) >= 0
-非常慢。我尝试过
"email1@gmail.com;email2@gmail.com ..."
将黑名单拆分成一个数组并使用array.IndexOf(email) >= 0
-更快
["email1@gmail.com","email2@gmail.com" ...]
创建一个对象,其中每个属性都是来自黑名单的电子邮件,并分配为“ true”,然后执行myObject[email]
;这似乎要快得多,但看起来却很像“混蛋”。
{"email1@gmail.com":true,"email2@gmail.com":true ...}
我该如何快速进行搜索,以免成为“麻烦”?
PS的问题不是黑名单的大小,黑名单有近1k封电子邮件。但是,我们每次都必须检查40万封电子邮件。
答案 0 :(得分:1)
我会说,最好使用预填充的Map
。
您可以拆分csv字符串并对其进行迭代。
我编写了两个性能测试,并在Chrome中运行了它们。在https://developer.mozilla.org/en-US/docs/Web/API/Performance/measure的帮助下。
我创建了两个地图。包含40万个条目的电子邮件映射和包含1k个条目的黑名单映射。 缺点:初始化需要很长时间。
// noprotect
console.clear();
const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;
// arrays
const emails = new Map();
const blacklist = new Map();
// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
if (bl % 2 === 0) {
blacklist.set('email' + bl, 'email' + bl);
} else {
blacklist.set('email@' + bl, 'email@' + bl);
}
}
// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
emails.set('email' + j, 'email' + j);
}
performance.mark('perfMailList-start');
// 1ms (includes, emails, reverse)
blacklist.forEach(blacklistItem => {
if (emails.has(blacklistItem)) {
mailMatches++;
}
});
// 32ms
/*emails.forEach(email => {
if(blacklist.has(email)) {
mailMatches++;
}
})*/
performance.mark('perfMailList-end');
performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');
const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];
console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);
// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();
以及一些交替使用includes
或indexOf
的循环(for,反向,forEach)。
// noprotect
console.clear();
const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;
// arrays
const emails = [];
const blacklist = [];
// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
// console.log(i)
if (bl % 2 === 0) {
blacklist.push('email' + bl);
} else {
blacklist.push('email@' + bl);
}
}
// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
emails.push('email' + j);
}
performance.mark('perfMailList-start');
// 524ms (indexOf, emails)
/*emails.forEach(mail => {
if(blacklist.indexOf(mail) >= 0){
mailMatches++;
}
})*/
// 583ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.indexOf(blacklistItem) >= 0){
mailMatches++;
}
})*/
// --------------------------
// 521ms (includes, emails)
/*emails.forEach(mail => {
if(blacklist.includes(mail)){
mailMatches++;
}
})*/
// 600ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.includes(blacklistItem)){
mailMatches++;
}
})*/
// --------------------------
// 638ms (includes, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
if(emails.includes(blacklist[i])){
mailMatches++;
}
}*/
// 632ms (indexOf, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
if(emails.indexOf(blacklist[i]) >= 0){
mailMatches++;
}
}*/
// --------------------------
// 530ms (includes, emails)
/*for(var i = EMAIL_COUNT; i--;) {
if(blacklist.includes(emails[i])){
mailMatches++;
}
}*/
// 530ms (indexOf, emails)
/*for(var i = EMAIL_COUNT; i--;) {
if(blacklist.indexOf(emails[i]) >= 0){
mailMatches++;
}
}*/
// --------------------------
// 525ms (includes, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
if(blacklist.includes(emails[i])) {
mailMatches++;
}
}*/
// 540ms (indexOf, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
if(blacklist.indexOf(emails[i]) >= 0) {
mailMatches++;
}
}*/
// --------------------------
// 668ms (includes, blacklist)
/*for(let i = 0; i < BLACKLIST_EMAIL_COUNT; i++) {
if(emails.includes(blacklist[i])) {
mailMatches++;
}
}*/
// 687ms (indexOf, blacklist)
/*for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
if(emails.indexOf(blacklist[k]) >= 0) {
mailMatches++;
}
}*/
// --------------------------
// 1367ms (equals)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
if(emails[i] === blacklist[k]) {
mailMatches++;
}
}
}*/
performance.mark('perfMailList-end');
performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');
const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];
console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);
// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();
MacBook: Pro(15英寸,2016年)
处理器:2.9 GHz Intel Core i7
内存:16 GB 2133 MHz LPDDR3
答案 1 :(得分:-1)
使用Array#includes
,让引擎实施者担心优化
blacklist.includes(email)
或者,使用集合或地图
https://jsperf.com/array-includes-and-find-methods-vs-set-has