执行和“正确”的方式来搜索JavaScript中的数组中的字符串

时间:2018-09-26 13:53:58

标签: javascript

我正在寻找一种非常快速的数组搜索解决方法。

我真正需要的是:针对黑名单将一系列电子邮件作为csv字符串进行检查。

我的解决方案,针对每封电子邮件:

  1. 使用blacklist.indexOf(email) >= 0-非常慢。我尝试过

    "email1@gmail.com;email2@gmail.com ..."

  2. 将黑名单拆分成一个数组并使用array.IndexOf(email) >= 0-更快

    ["email1@gmail.com","email2@gmail.com" ...]

  3. 创建一个对象,其中每个属性都是来自黑名单的电子邮件,并分配为“ true”,然后执行myObject[email];这似乎要快得多,但看起来却很像“混蛋”。

    {"email1@gmail.com":true,"email2@gmail.com":true ...}

我该如何快速进行搜索,以免成为“麻烦”?

PS的问题不是黑名单的大小,黑名单有近1k封电子邮件。但是,我们每次都必须检查40万封电子邮件。

2 个答案:

答案 0 :(得分:1)

我会说,最好使用预填充的Map。 您可以拆分csv字符串并对其进行迭代。 我编写了两个性能测试,并在Chrome中运行了它们。在https://developer.mozilla.org/en-US/docs/Web/API/Performance/measure的帮助下。

我创建了两个地图。包含40万个条目的电子邮件映射和包含1k个条目的黑名单映射。 缺点:初始化需要很长时间。

// noprotect
console.clear();

const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;

// arrays
const emails = new Map();
const blacklist = new Map();

// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
    if (bl % 2 === 0) {
        blacklist.set('email' + bl, 'email' + bl);
    } else {
        blacklist.set('email@' + bl, 'email@' + bl);
    }
}

// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
    emails.set('email' + j, 'email' + j);
}

performance.mark('perfMailList-start');

// 1ms (includes, emails, reverse)
blacklist.forEach(blacklistItem => {
    if (emails.has(blacklistItem)) {
        mailMatches++;
    }
});

// 32ms
/*emails.forEach(email => {
    if(blacklist.has(email)) {
        mailMatches++;
    }
})*/

performance.mark('perfMailList-end');

performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');

const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];

console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);

// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();

以及一些交替使用includesindexOf的循环(for,反向,forEach)。

// noprotect
console.clear();

const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;

// arrays
const emails = [];
const blacklist = [];

// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
    // console.log(i)
    if (bl % 2 === 0) {
        blacklist.push('email' + bl);
    } else {
        blacklist.push('email@' + bl);
    }
}

// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
    emails.push('email' + j);
}

performance.mark('perfMailList-start');

// 524ms (indexOf, emails)
/*emails.forEach(mail => {
if(blacklist.indexOf(mail) >= 0){
        mailMatches++;
}
})*/

// 583ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.indexOf(blacklistItem) >= 0){
        mailMatches++;
}
})*/

// --------------------------

// 521ms (includes, emails)
/*emails.forEach(mail => {
if(blacklist.includes(mail)){
        mailMatches++;
}
})*/

// 600ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.includes(blacklistItem)){
        mailMatches++;
}
})*/

// --------------------------

// 638ms (includes, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
    if(emails.includes(blacklist[i])){
        mailMatches++;
    }
}*/

// 632ms (indexOf, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
    if(emails.indexOf(blacklist[i]) >= 0){
        mailMatches++;
    }
}*/

// --------------------------

// 530ms (includes, emails)
/*for(var i = EMAIL_COUNT; i--;) {
    if(blacklist.includes(emails[i])){
        mailMatches++;
    }
    }*/

// 530ms (indexOf, emails)
/*for(var i = EMAIL_COUNT; i--;) {
    if(blacklist.indexOf(emails[i]) >= 0){
        mailMatches++;
    }
}*/

// --------------------------

// 525ms (includes, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
    if(blacklist.includes(emails[i])) {
        mailMatches++;
    }
}*/

// 540ms (indexOf, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
    if(blacklist.indexOf(emails[i]) >= 0) {
        mailMatches++;
    }
    }*/

// --------------------------

// 668ms (includes, blacklist)
/*for(let i = 0; i < BLACKLIST_EMAIL_COUNT; i++) {
    if(emails.includes(blacklist[i])) {
        mailMatches++;
    }
}*/

// 687ms (indexOf, blacklist)
/*for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
    if(emails.indexOf(blacklist[k]) >= 0) {
        mailMatches++;
    }
}*/

// --------------------------

// 1367ms (equals)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
    for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
        if(emails[i] === blacklist[k]) {
        mailMatches++;
        }
    }
}*/

performance.mark('perfMailList-end');

performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');

const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];

console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);

// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();

MacBook: Pro(15英寸,2016年)

处理器:2.9 GHz Intel Core i7

内存:16 GB 2133 MHz LPDDR3

答案 1 :(得分:-1)

使用Array#includes,让引擎实施者担心优化

blacklist.includes(email)

或者,使用集合或地图

https://jsperf.com/array-includes-and-find-methods-vs-set-has