如何从字符串数组中以任何顺序匹配和突出显示所有术语?

时间:2017-09-22 12:50:01

标签: javascript regex algorithm search highlighting

要求是:

  • 查找数组中的字符串(此处称为选项),其中包含所有 ANY 顺序
  • 中的字词
  • 正确地突出匹配条款 - 即。在每个匹配字词之前和之后插入一个字符串 - 我在这里使用<u></u>
  • 搜索查询和选项都可以是&#34;任何&#34;

为了简单起见,答案可以通过仅包含ASCII字符的列表集中于不区分大小写的搜索,并假设术语分隔符是一个普通空格 - 即。输入的查询为&#34; Foo bar baz&#34;表示搜索字词为['foo', 'bar', 'baz']

澄清:

  • 所有术语必须在匹配的选项中彼此分开存在 - 即。较短的术语不应仅作为较长术语的子串而不应存在两个术语
  • 重复搜索字词必须至少与查询
  • 一样多

最终的应用程序(可能并不奇怪)是一种自动完成的。

  

TL; DR

     

最近的小提琴并排比较所提出的算法:
  https://jsfiddle.net/Mikk3lRo/ndeuqn02/7/   
(如果添加新算法,请随时更新此链接)

     

jsPerf以更现实/更具代表性的方式比较算法 - 几个字符串基本上是&#34;输入&#34;每个代表一次一个字符:
  https://jsperf.com/comparison-of-algorithms-to-search-and-highlight

     

此时很清楚(由于trincot的优秀基础比较),原始实现所使用的大部分时间都用在DOM输出上。它的重要性尽可能地在小提琴中被最小化。

     

每次搜索时算法之间的性能仍然存在明显差异,但在每次击键时,其中没有一种算法始终快速 est 。在重新审视并清理了我自己的“分而治之”之后#34;在我尝试的任何现实场景中,它确实优于其他人。

     

Tigregalis介绍了预运行优化的想法,这似乎是合理的,因为选项不太可能在击键之间发生变化。我在这里为所有方法添加了(一个函数)。我从中看到一个明显的好处的唯一算法是在Skirtle的排列中,但我鼓励每个回答者考虑它是否对他们自己的算法有用。

     

某些算法比其他算法更容易适应。我仍然认为这比实际实施中的微小性能差异更重要。

     

请注意当前版本的Tigregalis&#39;收缩集有一个错误 - 我已经将它从小提琴和jsperf中排除,直到它被修复。

病毒排列

理论上,这可以通过&#34;手动&#34;构建一个RegExp,它包含由捕获组分隔的搜索项的每个可能的排列,以捕获术语之间的任何内容 - 搜索&#34; foo bar&#34;结果为(foo)(.*?)(bar)|(bar)(.*?)(foo)

然后使用string.replace()一次性完成突出显示。如果字符串中有任何更改,我们就会匹配。

演示:

&#13;
&#13;
var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D\'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People\'s Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People\'s Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];
var terms, terms_esc;
function viral_permutations() {
    var t0, t1, i, permuted, res_elm, meta_elm, regex_string, regex, li, option, match_groups, highlighted;
    meta_elm = document.getElementById('viral_permutations_meta');
    res_elm = document.getElementById('viral_permutations_result');
    res_elm.innerHTML = meta_elm.innerHTML = '';
    
    t0 = performance.now();
    
    //Split query in terms at delimiter and lowercase them
    terms = document.getElementById('viral_permutations').value.split(/\s/).filter(function(n) {
        return n;
    }).map(function(term){
        return term.toLowerCase();
    });
    meta_elm.innerHTML += 'Terms: ' + JSON.stringify(terms) + '<br>';
        
    //Escape terms
    terms_esc = terms.map(function(term) {
        return term.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
    });
    
    //Wrap terms in in individual capturing groups
    terms_esc = terms.map(function(term) {
        return '(' + term + ')';
    });
    
    //Find all permutations
    permuted = permutate_array(terms_esc);
    
    //Construct a group for each permutation
    match_groups = [];
    for (var i in permuted) {
        match_groups.push(permuted[i].join('(.*?)'));
    }
    
    try {
        //Construct the final regex
        regex_string = match_groups.join('|');
        //Display it
        document.getElementById('viral_permutations_regex').innerHTML = regex_string;
        meta_elm.innerHTML += 'RegExp length: ' + regex_string.length + '<br>';
        
        regex = new RegExp(regex_string, 'i');
    

        //Loop through each option
        for (i = 0; i < options.length; i++) {
            option = options[i];

            //Replace the terms with highlighted terms
            highlighted = option.replace(regex, viral_permutations_replace);

            //If anything was changed (or the query is empty) we have a match
            if (highlighted !== option || terms.length === 0) {
                //Append it to the result list
                li = document.createElement('li');
                li.innerHTML = highlighted;
                res_elm.appendChild(li);
            }
        }

        //Print some meta
        t1 = performance.now();
        meta_elm.innerHTML += 'Time: ' + (Math.round((t1 - t0) * 100) / 100) + 'ms';
    } catch(e) {
        meta_elm.innerHTML += '<span style="color:red">' + e.message + '</span>';
    }
}

//The replacement function
function viral_permutations_replace() {
    var i, m, j, retval, m_cased, unmatched;
    retval = '';
    
    //Make a clone of the terms array (that we can modify without destroying the original)
    unmatched = terms.slice(0);
    
    //Loop arguments between the first (which is the full match) and
    //the last 2 (which are the offset and the full option)
    for (i = 1; i < arguments.length - 1; i++) {
        m = arguments[i];
        
        //Check that we have a string - most of the arguments will be undefined
        if (typeof m !== 'string') continue;
        
        //Lowercase the match
        m_cased = m.toLowerCase();
        
        //Append it to the return value - highlighted if it is among our terms
        j = unmatched.indexOf(m_cased);
        if (j >= 0) {
            //Remove it from the unmatched terms array
            unmatched.splice(j, 1);
            retval += '<u>' + m + '</u>';
        } else {
            retval += m;
        }
    }
    return retval;
}

//Helper function to return all possible permutations of an array
function permutate_array(arr) {
    var perm, perm_intern;
    perm_intern = function(perm, pre, post, n) {
        var elem, i, j, ref, rest;
        if (n > 0) {
            for (i = j = 0, ref = post.length; 0 <= ref ? j < ref : j > ref; i = 0 <= ref ? ++j : --j) {
                rest = post.slice(0);
                elem = rest.splice(i, 1);
                perm_intern(perm, pre.concat(elem), rest, n - 1);
            }
        } else {
            perm.push(pre);
        }
    };
    perm = [];
    perm_intern(perm, [], arr, arr.length);
    return perm;
}
viral_permutations();
&#13;
<input type="text" id="viral_permutations" onkeyup="viral_permutations()">
<p id="viral_permutations_meta"></p>
<pre style="width:100%;overflow:auto;background:#eee" id="viral_permutations_regex"></pre>
<ul style="height:300px;overflow:auto" id="viral_permutations_result"></ul>
&#13;
&#13;
&#13;

感谢trincot 指出我的原始版本偶尔会突出显示一个重复的术语两次 - 这在此代码段中得到修复。

失败原因是:

  • 随着术语的增加,正则表达式呈指数级增长。 7个字词(甚至是单个字母)超过250kb,我的浏览器放弃了Error: regexp too big ......

其他一些值得注意的策略:

捕获每组中包含所有术语的组:

(foo|bar)(.*)(foo|bar)

失败原因是:

  • 将匹配包含重复字词的选项 - fx。 The food in the food court会匹配,但显然不应该。
  • 如果我们&#34;仔细检查&#34;事实上,所有条款都发现无法找到有效的匹配 - fx。 The food in the food barfoo两次,而且永远不会到bar

否定前瞻和反向引用:

(foo|bar|baz)(.*?)((?!\1)(?:foo|bar|baz))(.*?)((?!\1|\3)(?:foo|bar|baz))

失败原因是:

  • 每当在查询中重复使用条款时,它都会达到不可能的条件,例如&#34;找到foobarbar,这不是foo也不是bar&#34;。
  • 我相当肯定它还有其他问题,但当我意识到它存在逻辑上的缺陷时,我就停止追求它了。

积极向前看

(?=.*foo)(?=.*bar)(?=.*baz)

失败原因是:

  • 很难(如果不是不可能)可靠地突出显示找到的匹配。
  • 我还没有找到任何方法来确保所有条款确实存在 - 即。它们可能都单独存在于期权中,但较短的期限可能仅作为较长期限的子串存在 - 或者术语可能重叠。

5 个答案:

答案 0 :(得分:6)

我试一试,但我不确定这会有多大帮助。我的方法类似于你的分而治之技术。

不是咬掉字符串的一部分,而是提前搜索每个术语并存储所有匹配的集合,记录开始和结束位置。如果某个特定搜索字词没有足够匹配,则该算法会立即为该选项提供保证。

一旦它将所有可能的匹配聚集在一起,它就会递归地尝试找到一个不重叠的组合。在那次递归中有很多数据结构的复制,我怀疑它可以比我在这里更好地进行优化。我也只能为一些变量名称道歉,我正在努力想出任何有意义的名字。

对于某些测试搜索,例如a n a n a n a n ...,它似乎比原来的Divide and Conquer技术更好地应对,但我怀疑这可能仅仅是因为找到的匹配不足时执行的早期纾困特定的搜索词。如果没有大量的实际数据,很难知道真正有价值的优化将在何处。

&#13;
&#13;
function search() {
    var options = [
        'ababababa',
        'United States',
        'United Kingdom',
        'Afghanistan',
        'Aland Islands',
        'Albania',
        'Algeria',
        'American Samoa',
        'Andorra',
        'Angola',
        'Anguilla',
        'Antarctica',
        'Antigua and Barbuda',
        'Argentina',
        'Armenia',
        'Aruba',
        'Australia',
        'Austria',
        'Azerbaijan',
        'Bahamas',
        'Bahrain',
        'Bangladesh',
        'Barbados',
        'Belarus',
        'Belgium',
        'Belize',
        'Benin',
        'Bermuda',
        'Bhutan',
        'Bolivia, Plurinational State of',
        'Bonaire, Sint Eustatius and Saba',
        'Bosnia and Herzegovina',
        'Botswana',
        'Bouvet Island',
        'Brazil',
        'British Indian Ocean Territory',
        'Brunei Darussalam',
        'Bulgaria',
        'Burkina Faso',
        'Burundi',
        'Cambodia',
        'Cameroon',
        'Canada',
        'Cape Verde',
        'Cayman Islands',
        'Central African Republic',
        'Chad',
        'Chile',
        'China',
        'Christmas Island',
        'Cocos (Keeling) Islands',
        'Colombia',
        'Comoros',
        'Congo',
        'Congo, The Democratic Republic of The',
        'Cook Islands',
        'Costa Rica',
        'Cote D\'ivoire',
        'Croatia',
        'Cuba',
        'Curacao',
        'Cyprus',
        'Czech Republic',
        'Denmark',
        'Djibouti',
        'Dominica',
        'Dominican Republic',
        'Ecuador',
        'Egypt',
        'El Salvador',
        'Equatorial Guinea',
        'Eritrea',
        'Estonia',
        'Ethiopia',
        'Falkland Islands (Malvinas)',
        'Faroe Islands',
        'Fiji',
        'Finland',
        'France',
        'French Guiana',
        'French Polynesia',
        'French Southern Territories',
        'Gabon',
        'Gambia',
        'Georgia',
        'Germany',
        'Ghana',
        'Gibraltar',
        'Greece',
        'Greenland',
        'Grenada',
        'Guadeloupe',
        'Guam',
        'Guatemala',
        'Guernsey',
        'Guinea',
        'Guinea-bissau',
        'Guyana',
        'Haiti',
        'Heard Island and Mcdonald Islands',
        'Holy See (Vatican City State)',
        'Honduras',
        'Hong Kong',
        'Hungary',
        'Iceland',
        'India',
        'Indonesia',
        'Iran, Islamic Republic of',
        'Iraq',
        'Ireland',
        'Isle of Man',
        'Israel',
        'Italy',
        'Jamaica',
        'Japan',
        'Jersey',
        'Jordan',
        'Kazakhstan',
        'Kenya',
        'Kiribati',
        'Korea, Democratic People\'s Republic of',
        'Korea, Republic of',
        'Kuwait',
        'Kyrgyzstan',
        'Lao People\'s Democratic Republic',
        'Latvia',
        'Lebanon',
        'Lesotho',
        'Liberia',
        'Libya',
        'Liechtenstein',
        'Lithuania',
        'Luxembourg',
        'Macao',
        'Macedonia, The Former Yugoslav Republic of',
        'Madagascar',
        'Malawi',
        'Malaysia',
        'Maldives',
        'Mali',
        'Malta',
        'Marshall Islands',
        'Martinique',
        'Mauritania',
        'Mauritius',
        'Mayotte',
        'Mexico',
        'Micronesia, Federated States of',
        'Moldova, Republic of',
        'Monaco',
        'Mongolia',
        'Montenegro',
        'Montserrat',
        'Morocco',
        'Mozambique',
        'Myanmar',
        'Namibia',
        'Nauru',
        'Nepal',
        'Netherlands',
        'New Caledonia',
        'New Zealand',
        'Nicaragua',
        'Niger',
        'Nigeria',
        'Niue',
        'Norfolk Island',
        'Northern Mariana Islands',
        'Norway',
        'Oman',
        'Pakistan',
        'Palau',
        'Palestinian Territory, Occupied',
        'Panama',
        'Papua New Guinea',
        'Paraguay',
        'Peru',
        'Philippines',
        'Pitcairn',
        'Poland',
        'Portugal',
        'Puerto Rico',
        'Qatar',
        'Reunion',
        'Romania',
        'Russian Federation',
        'Rwanda',
        'Saint Barthelemy',
        'Saint Helena, Ascension and Tristan da Cunha',
        'Saint Kitts and Nevis',
        'Saint Lucia',
        'Saint Martin (French part)',
        'Saint Pierre and Miquelon',
        'Saint Vincent and The Grenadines',
        'Samoa',
        'San Marino',
        'Sao Tome and Principe',
        'Saudi Arabia',
        'Senegal',
        'Serbia',
        'Seychelles',
        'Sierra Leone',
        'Singapore',
        'Sint Maarten (Dutch part)',
        'Slovakia',
        'Slovenia',
        'Solomon Islands',
        'Somalia',
        'South Africa',
        'South Georgia and The South Sandwich Islands',
        'South Sudan',
        'Spain',
        'Sri Lanka',
        'Sudan',
        'Suriname',
        'Svalbard and Jan Mayen',
        'Swaziland',
        'Sweden',
        'Switzerland',
        'Syrian Arab Republic',
        'Taiwan, Province of China',
        'Tajikistan',
        'Tanzania, United Republic of',
        'Thailand',
        'Timor-leste',
        'Togo',
        'Tokelau',
        'Tonga',
        'Trinidad and Tobago',
        'Tunisia',
        'Turkey',
        'Turkmenistan',
        'Turks and Caicos Islands',
        'Tuvalu',
        'Uganda',
        'Ukraine',
        'United Arab Emirates',
        'United Kingdom',
        'United States',
        'United States Minor Outlying Islands',
        'Uruguay',
        'Uzbekistan',
        'Vanuatu',
        'Venezuela, Bolivarian Republic of',
        'Viet Nam',
        'Virgin Islands, British',
        'Virgin Islands, U.S.',
        'Wallis and Futuna',
        'Western Sahara',
        'Yemen',
        'Zambia',
        'Zimbabwe'
    ];

    var terms = document.getElementById('search').value.trim().toLowerCase().split(/\s+/);

    if (!terms[0]) {
        terms = [];
    }

    document.getElementById('terms').innerText = 'Terms: ' + JSON.stringify(terms);

    var startTime = performance.now();

    // Term counts is a map storing how many times each search term appears in the query
    var termCounts = {};

    terms.forEach(function(term) {
        termCounts[term] = (termCounts[term] || 0) + 1;
    });

    // An array of search terms with the duplicates removed
    var uniqueTerms = Object.keys(termCounts);

    // Loop through each option and map to either a highlight version or null
    options = options.map(function(optionText) {
        var matches = {},
            lastMatchIndex = {},
            option = optionText.toLowerCase();

        uniqueTerms.forEach(function(term) {
            // This array will be populated with start/end position of each match for this term
            matches[term] = [];

            // The index of the last match... which could be deduced from the matches but this is slightly easier
            lastMatchIndex[term] = -1;
        });

        var incompleteMatchTerms = uniqueTerms.slice(),
            nextMatchTerm;

        // This is probably a premature optimization but doing it this
        // way ensures we check that each search term occurs at least
        // once as quickly as possible.
        while (nextMatchTerm = incompleteMatchTerms.shift()) {
            var nextMatchIndex = option.indexOf(nextMatchTerm, lastMatchIndex[nextMatchTerm] + 1);

            if (nextMatchIndex === -1) {
                // Haven't found enough matches for this term, so the option doesn't match
                if (termCounts[nextMatchTerm] > matches[nextMatchTerm].length) {
                    return null;
                }
            }
            else {
                // Found another match, put the term back on the queue
                // for another round
                incompleteMatchTerms.push(nextMatchTerm);
                lastMatchIndex[nextMatchTerm] = nextMatchIndex;

                matches[nextMatchTerm].push({
                    start: nextMatchIndex,
                    end: nextMatchIndex + nextMatchTerm.length
                });
            }
        }

        // Pass in the original array of terms... we attempt to highlight in the order of the original query
        var highlights = performHighlight(terms, matches);

        if (!highlights) {
            return null;
        }

        // We need the highlights sorted so that we do the replacing from the end of the string
        highlights.sort(function(h1, h2) {
            return h2.start - h1.start;
        });

        highlights.forEach(function(highlight) {
            optionText = optionText.slice(0, highlight.start)
                    + '<u>' + optionText.slice(highlight.start, highlight.end) + '</u>'
                    + optionText.slice(highlight.end);
        });

        return optionText;

        function performHighlight(terms, allMatches) {
            // If there are no terms left to match we've got a hit
            if (terms.length === 0) {
                return [];
            }

            var nextTerms = terms.slice(),
                term = nextTerms.shift(),
                matches = allMatches[term].slice(),
                match;

            while (match = matches.shift()) {
                var nextMatches = {};

                // We need to purge any entries from nextMatches that overlap the current match
                uniqueTerms.forEach(function(nextTerm) {
                    var nextMatch = term === nextTerm ? matches : allMatches[nextTerm];

                    nextMatches[nextTerm] = nextMatch.filter(function(match2) {
                        return match.start >= match2.end || match.end <= match2.start;
                    });
                });

                var highlights = performHighlight(nextTerms, nextMatches);

                if (highlights) {
                    highlights.push(match);

                    return highlights;
                }
            }

            return null;
        }
    });

    document.getElementById('results').innerHTML = options.map(function(option) {
        if (option) {
            return '<li>' + option + '</li>';
        }

        return '';
    }).join('');

    document.getElementById('time').innerText = Math.round((performance.now() - startTime) * 100) / 100 + 'ms';
}
&#13;
<h1>Permutations</h1>
<input type="text" id="search" onkeyup="search()" autocomplete="off">
<p id="terms"></p>
<p id="time"></p>
<ul id="results"></ul>
&#13;
&#13;
&#13;

<强>更新

根据评论中来自 Mikk3lRo 的反馈,我做了一些性能调优并提出了这个:

https://jsfiddle.net/skirtle/ndeuqn02/1/

核心算法是相同的,但我更难以理解,所有这些都是以性能为名。大多数更改都与尽可能避免创建新对象有关。

由于算法会提前搜索可能永远不需要的东西,因此其他算法总是有机会更快,特别是在简单的情况下。其中许多案例可以单独处理,但我还没有尝试过这种优化。

在Chrome中,它现在在许多不同场景中都优于其他实现,但这是一种不公平的比较,因为它们还没有以相同的方式进行调整。对于简单的搜索,其他实现在Firefox中的速度稍微快一点,但时间都在同一个范围内。

一些特别有趣的搜索:

  • a ab ba baba。我添加了一个新选项&#39;并调整CSS以证明这一点。算法在选择执行突出显示的方式上有所不同。我的算法支持查询中的术语顺序,而不是基于术语的长度。如果我不担心订购,可以进一步优化,但我认为它们仅在极端情况下有所帮助。
  • t r i s t a n d a c u n h a。请注意字母之间的空格,这些是14个单独的搜索字词。如果您一次输入一个术语 Divide and Conquer 将很快开始挣扎,但它会在最后恢复。 擦拭阴影可以更长时间地应对,但当您输入字母c时,它们会从悬崖上掉下来。我认为这是回溯的指数性爆炸,但我还没有证实这一点。我确信通过一些工作可以在简单的情况下解决,但如果回溯是由无法解决的重叠引起的,那么修复可能会比较棘手。

我确信所有的实现都可以通过更多的调整和一些精心设计的特殊情况处理来加速。哪一个实际上是最好的&#39;对于真实场景,我不确定,但我目前的感觉是我的算法可能只有一个狭窄的最佳位置,在真正公平的测试中它会胜过其他人。对于真实搜索而言,一项并不能完成所有预先搜索的算法似乎很难被击败。

更新2

我尝试过对我之前方法的不同实现:

https://jsfiddle.net/skirtle/ndeuqn02/9/

请注意,我只更新了自己的实施,其他的仍未过时。

我以为我会尝试通过懒散地执行它们来避免不必要的搜索,而不是先完成它们。它仍然缓存它们,以便在算法需要回溯时可以重用它们。我不知道这是否会产生重大影响,因为在短字符串上执行少量额外搜索可能不会增加很多开销。

我还试验过切断函数递归。虽然它看起来确实有效,但我觉得它有很高的漏洞风险(需要进行大量的单元测试才能确保它确实有效)。我不相信这部分确实是成功的,因为涉及的数据结构使得它很难遵循。它似乎很快但不足以证明复杂性。

我还尝试了其他方法来构建最终的亮点。所有排序和切片似乎都是性能消耗,但同样,代码变得更加复杂,试图避免它。其中一些增益可能适用于其他算法。

我在这里介绍的另一个想法是查询术语的预搜索分析(仅依赖于查询而不依赖于选项)。它检查术语是否可以重叠,对于任何不可能重叠的术语(例如cat dog),它使用更简单的算法来获取匹配。这个想法也可能适用于其他算法。

正如评论中所提到的,对选项进行某种预搜索分析的想法也是可能的,但我实际上并没有在这里实现。很难知道哪种搜索索引最有利,因为它取决于内存使用情况和选项的具体情况。但是,尝试将少量信息从一次搜索传输到下一次搜索可能更实际。

例如,如果有人搜索united states,那么他们输入的最后一件事很可能是最终s,他们之前的搜索是united state。基于此的两个潜在优化是:

  1. united states的匹配选项将是united state的匹配选项的子集,因此,如果我们记住该列表,我们可以节省必须检查所有其他选项。这可以用于任何算法。
  2. 在我的算法的情况下,匹配缓存可以从一次搜索保留到下一次搜索。虽然state的缓存条目不是任何用途,但united的条目从一个搜索到下一个搜索将完全相同,这样我们就可以跳过算法的昂贵部分术语

答案 1 :(得分:6)

我建议在分歧和征服想法上略有变化:不要将字符串分成块(咬),你可以&#34;消灭&#34;已匹配的字符,并对该字符串执行进一步搜索。要删除的字符将是分隔符,因为它保证不会出现在任何条款中。

这是:

&#13;
&#13;
function trincotWipeSearch(query, options, separator) {
    // Split query in terms at delimiter
    const terms = query.split(separator).filter(Boolean);
    if (!terms.length) return options;
    // Sort terms by descending size
    terms.sort( (a,b) => b.length - a.length );

    // Escape terms, and enrich with size of original term
    // and a string of the same length filled with the separator char
    const items = terms.map(term => ({
        size: term.length,
        wipe: separator.repeat(term.length), 
        regex: new RegExp(term.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&"), 'gi')
    }));

    function getOffsets(termIndex, text) {
        // All terms found?
        if (termIndex >= terms.length) return [];
        let match;
        const { regex, size, wipe } = items[termIndex];
        regex.lastIndex = 0;
        while (match = regex.exec(text)) {
            let index = match.index;
            // Wipe characters and recurse to find other terms
            let offsets = getOffsets(termIndex+1, 
                text.substr(0, index) + wipe + text.substr(index + size));
            if (offsets !== undefined) { 
                // Solution found, backtrack all the way
                return offsets.concat([index, index + size]);
            }
            regex.lastIndex = match.index + 1;
        }
    }

    // Loop through each option
    return options.map( option => {
        // Get the offsets of the matches
        let offsets = getOffsets(0, option);
        if (offsets) {
            // Apply the offsets to add the markup
            offsets
                .sort( (a,b) => b - a )
                .map((index, i) => {
                    option = option.substr(0, index) 
                        + (i%2 ? "<u>" : "</u>")
                        + option.substr(index);
                });
            return option;
        }
    }).filter(Boolean); // get only the non-empty results
}

var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D\'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People\'s Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People\'s Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];

/*
 * I/O and performance measurements 
 */

function processInput() {
    var query = this.value.toLowerCase();
    const t0 = performance.now();
    const matches = trincotWipeSearch(query, options, ' ');
    const spentTime = performance.now() - t0;
    // Output the time spent
    time.textContent = spentTime.toFixed(2);
    // Output the matches
    result.innerHTML = '';
    for (var match of matches) {
        // Append it to the result list
        var li = document.createElement('li');
        li.innerHTML = match;
        result.appendChild(li);
    }
}

findTerms.addEventListener('keyup', processInput);
processInput.call(findTerms);
&#13;
ul { 
    height:300px;
    font-size: smaller;
    overflow: auto;
}
&#13;
Input terms: <input type="text" id="findTerms"><br>

<h3>Trincot's Wipe Search</h3>
Time: <span id="time"></span>ms<br>
<ul id="result"></ul>
&#13;
&#13;
&#13;

我从时间测量中排除了DOM I / O.

这是jsfiddle并排比较两种算法。添加第三种算法以与其他算法进行比较仍然不难。

当分隔符可以是任何正则表达式

...然后无法使用上述功能。解决这个问题的一种方法是引入一个&#34;阴影&#34;字符串,与选项字符串一样长,但只包含2个不同的可能字符(例如.x):

  • 其中一个表示选项字符串中的相应字符(即位于同一位置)已与一个术语匹配,因此不再适用于其他术语的匹配。< / p>

  • 另一个字符表示选项字符串中的相应字符仍可用于包含在术语的匹配中。

显然这会使函数慢一点,因为在检查这个阴影字符串后可能会有一些匹配需要被拒绝:

&#13;
&#13;
function trincotShadowMarks (query, options, separator) {
    // Split query in terms at delimiter
    const terms = query.split(separator).filter(Boolean);
    if (!terms.length) return options;
    // Sort terms by descending size
    terms.sort( (a,b) => b.length - a.length );

    // Escape terms, and enrich with size of original term
    // and a string of the same length filled with the separator char
    const items = terms.map(term => ({
        size: term.length,
        used: 'x'.repeat(term.length),
        free: '.'.repeat(term.length),
        regex: new RegExp(term.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&"), 'gi')
    }));

    function getOffsets(termIndex, text, shadow) {
        // All terms found?
        if (termIndex >= terms.length) return [];
        let match;
        const { regex, size, used, free } = items[termIndex];
        regex.lastIndex = 0;
        while (regex.lastIndex > -1 && (match = regex.exec(text))) {
            let index = match.index;
            // Is this match not overlapping with another match?
            if (!shadow.substr(index, size).includes('x')) {
                // Mark position as used and recurse to find other terms
                let offsets = getOffsets(termIndex+1, text,
                    shadow.substr(0, index) + used + shadow.substr(index + size));
                if (offsets !== undefined) { 
                    // Solution found, backtrack all the way
                    return offsets.concat([index, index + size]);
                }
            }
            regex.lastIndex = shadow.indexOf(free, match.index + 1);
        }
    }

    // Loop through each option
    return options.map( option => {
        // Get the offsets of the matches
        let offsets = getOffsets(0, option, '.'.repeat(option.length));
        if (offsets) {
            // Apply the offsets to add the markup
            offsets
                .sort( (a,b) => b - a )
                .map((index, i) => {
                    option = option.substr(0, index) 
                        + (i%2 ? "<u>" : "</u>")
                        + option.substr(index);
                });
            return option;
        }
    }).filter(Boolean); // get only the non-empty results
}

var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D\'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People\'s Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People\'s Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];

/*
 * I/O and performance measurements 
 */

function processInput() {
    var query = this.value.toLowerCase();
    const t0 = performance.now();
    const matches = trincotShadowMarks(query, options, ' ');
    const spentTime = performance.now() - t0;
    // Output the time spent
    time.textContent = spentTime.toFixed(2);
    // Output the matches
    result.innerHTML = '';
    for (var match of matches) {
        // Append it to the result list
        var li = document.createElement('li');
        li.innerHTML = match;
        result.appendChild(li);
    }
}

findTerms.addEventListener('keyup', processInput);
processInput.call(findTerms);
&#13;
ul { 
    height:300px;
    font-size: smaller;
    overflow: auto;
}
&#13;
Input terms: <input type="text" id="findTerms"><br>

<h3>Trincot's Wipe Search</h3>
Time: <span id="time"></span>ms<br>
<ul id="result"></ul>
&#13;
&#13;
&#13;

答案 2 :(得分:3)

分而治之

比单正则表达式病毒排列策略更复杂 - 这个递归算法一个接一个地搜索每个术语,从最长的术语开始。

每次找到匹配项时,都会将其分成“#34; bite&#34;分为三个(除非在开头或结尾),标记匹配的&#34;咬&#34;消费,并试图匹配任何未消费的&#34;咬伤&#34;中的下一个最长的术语。

如果无法找到最长的不匹配术语,它将回溯并尝试在不同的位置匹配前一个术语(即使在不同的位置&#34;咬&#34;)。

如果它回到最长的一段时间而无法找到另一个与之匹配的位置,它将返回false。

这意味着在大多数情况下,它可以很快返回非匹配,仅仅是因为它们甚至不包含最长的术语。

当然,如果它用完了 - 即。成功匹配最短的 - 它将通过加入所有&#34; bites&#34;来返回突出显示的匹配。回到一起。

演示:

已更新以提高性能:基本算法完全相同,但有一些非常昂贵的arr.slice()调用可以完全避免。

&#13;
&#13;
function divide_and_conquer_replace(query, options, separator) {
    var terms, terms_esc;

    //The inner replacement function
    function divide_and_conquer_inner(bites, depth) {
        var this_term, i, bite, match, new_bites, found_all_others;

        depth = depth ? depth : 1;

        //Get the longest remaining term
        this_term = terms_esc[terms_esc.length - depth];

        //Loop all the bites
        for (i = 0; i < bites.length; i++) {
            bite = bites[i];

            //Reset the lastIndex since we're reusing the RegExp objects
            this_term.lastIndex = 0;

            //Check that we have a string (ie. do not attempt to match bites
            //that are already consumed)
            if (typeof bite === 'string') {
                //Find the next matching position (if any)
                while (match = this_term.exec(bite)) {
                    new_bites = (i > 0) ? bites.slice(0, i) : [];
                    if (match.index > 0) {
                        new_bites.push(bite.slice(0, match.index));
                    }
                    new_bites.push(['<u>' + match[0] + '</u>']);
                    if (this_term.lastIndex < bite.length) {
                        new_bites.push(bite.slice(this_term.lastIndex));
                    }
                    if (i < bites.length - 1) {
                        new_bites = new_bites.concat(bites.slice(i + 1));
                    }

                    if (terms_esc.length > depth) {
                        //Attempt to find all other terms
                        found_all_others = divide_and_conquer_inner(new_bites, depth + 1);

                        //If we found all terms we'll pass the modified string all the
                        //way up to the original callee
                        if (found_all_others) {
                            return found_all_others;
                        }
                        //Otherwise try to match current term somewhere else
                        this_term.lastIndex = match.index + 1;
                    } else {
                        //If no terms remain we have a match
                        return new_bites.join('');
                    }
                }
            }
        }
        //If we reach this point at least one term was not found
        return null;
    };

    // Split query in terms at delimiter
    terms = query.split(separator).filter(Boolean);
    if (!terms.length) return options;


    //Sort terms according to length - longest term last
    terms.sort(function(a, b) {
        return a.length - b.length;
    });

    //Escape terms
    //And store RegExp's instead of strings
    terms_esc = terms.map(function (term) {
        return term.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
    }).map(function (term) {
        return new RegExp(term, 'gi');
    });

    //Loop through each option
    return options.map(function(option){
        return divide_and_conquer_inner([option]);
    }).filter(Boolean);
}

var options = ['United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D\'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People\'s Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People\'s Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];
var separator = ' ';

function divide_and_conquer(){
    var query = document.getElementById('divide_and_conquer').value;
    var res_elm = document.getElementById('divide_and_conquer_result');
    var t0 = performance.now();
    var results = divide_and_conquer_replace(query, options, separator);
    var t1 = performance.now();
    document.getElementById('divide_and_conquer_meta').innerHTML = 'Time: ' + (t1 - t0).toFixed(2) + 'ms';
    res_elm.innerHTML = '';
    for (var result of results) {
        res_elm.innerHTML += '<li>' + result + '</li>';
    }
};
divide_and_conquer();
&#13;
<input type="text" id="divide_and_conquer" onkeyup="divide_and_conquer()">
<p id="divide_and_conquer_meta"></p>
<ul style="height:300px;overflow:auto" id="divide_and_conquer_result"></ul>
&#13;
&#13;
&#13;

当查询完全由(通常很短的)字符串组成时,此策略会出现性能问题,这些字符串在许多选项中都存在/最常见 - 例如a a a a a a a a ...

在现实场景中,它目前的表现优于其他提议的算法 - 请参阅问题中添加的jsperf链接。

答案 3 :(得分:2)

这是一种完全不同于我之前回答的方法 - 我无法将以下所有内容添加到(大小限制)中,所以......它是一个单独的答案。

广义后缀树:预处理选项

generalized suffix tree是一种理论上允许以有效的方式搜索一组字符串中的子串的结构。所以我想我会去做。

this awesome explanation of Ukkonen's algorithm可以看出,以有效的方式构建这样的树是非常简单的,这涉及为一个短语(选项)构建suffix tree

我从实施found here中获得灵感,需要适应:

  • 应用更好的编码风格(例如,摆脱非显式声明的全局变量)
  • 无需在文本后添加分隔符即可使其正常工作。这真的很棘手,我希望我不会错过一些边境条件
  • 使其适用于多个字符串(即一般化)

所以这是:

&#13;
&#13;
"use strict";
// Implementation of a Generalized Suffix Tree using Ukkonen's algorithm
// See also: https://stackoverflow.com/q/9452701/5459839
class Node {
    constructor() {
        this.edges = {};
        this.suffixLink = null;
    }
    addEdge(ch, textId, start, end, node) {
        this.edges[ch] = { textId, start, end, node };
    }
}

class Nikkonen extends Node {
    constructor() {
        super(); // root node of the tree
        this.texts = [];
    }
    findNode(s) {
        if (!s.length) return;
        let node = this,
            len,
            suffixSize = 0,
            edge;
        for (let i = 0; i < s.length; i += len) {
            edge = node.edges[s.charAt(i)];
            if (!edge) return;
            len = Math.min(edge.end - edge.start, s.length - i);
            if (this.texts[edge.textId].substr(edge.start, len) !== s.substr(i, len)) return;
            node = edge.node;
        }
        return { edge, len };
    }
    findAll(term, termId = 1) {
        const { edge, len } = this.findNode(term) || {};
        if (!edge) return {}; // not found
        // Find all leaves
        const matches = new Map;
        (function recurse({ node, textId, start, end }, suffixLen) {
            suffixLen += end - start;
            const edges = Object.values(node.edges);
            if (!edges.length) { // leaf node: calculate the match
                if (!(matches.has(textId))) matches.set(textId, []);
                matches.get(textId).push({ offset: end - suffixLen, termId });
                return;
            }
            edges.forEach( edge => recurse(edge, suffixLen) );
        })(edge, term.length - len);
        return matches;
    }
    addText(text) { 
        // Implements Nikkonen's algorithm for building the tree
        // Inspired by https://felix-halim.net/misc/suffix-tree/
        const root = this,
            active = {
                node: root,
                textId: this.texts.length,
                start: 0,
                end: 0,
            },
            texts = this.texts;
        
        // Private functions
        function getChar(textId, i) {
            return texts[textId].charAt(i) || '$' + textId;
        }
        
        function addEdge(fromNode, textId, start, end, node) {
            fromNode.addEdge(getChar(textId, start), textId, start, end, node);
        }
        
        function testAndSplit() {
            const ch = getChar(active.textId, active.end);
            if (active.start < active.end) {
                const edge = active.node.edges[getChar(active.textId, active.start)],
                    splitPoint = edge.start + active.end - active.start;
                if (ch === getChar(edge.textId, splitPoint)) return;
                const newNode = new Node();
                addEdge(active.node, edge.textId, edge.start, splitPoint, newNode);
                addEdge(newNode, edge.textId, splitPoint, edge.end, edge.node);
                return newNode;
            }
            if (!(ch in active.node.edges)) return active.node;
        }

        function canonize() {
            while (active.start < active.end) {
                const edge = active.node.edges[getChar(active.textId, active.start)]; 
                if (edge.end - edge.start > active.end - active.start) break;
                active.start += edge.end - edge.start; 
                active.node = edge.node;
            }
        }
        
        function update() {
            let prevNewNode = root,
                newNode;
            
            while (newNode = testAndSplit()) {
                addEdge(newNode, active.textId, active.end, text.length+1, new Node());
                // Rule 2: add suffix-link from previously inserted node
                if (prevNewNode !== root) {
                    prevNewNode.suffixLink = newNode; 
                }
                prevNewNode = newNode;
                // Rule 3: follow suffixLink after split
                active.node = active.node.suffixLink || root;
                canonize(); // because active.node changed
            }
            if (prevNewNode !== root) {
                prevNewNode.suffixLink = active.node;
            }
        }

        texts.push(text);

        if (!root.suffixLink) root.suffixLink = new Node(); 
        for (let i = 0; i < text.length; i++) {
            addEdge(root.suffixLink, active.textId, i, i+1, root);
        }

        // Main Ukkonen loop: add each character from left to right to the tree
        while (active.end <= text.length) {
            update();
            active.end++;
            canonize(); // because active.end changed
        }
    }
}

function trincotSuffixTree(query, options, suffixTree, separator) {
    // Split query in terms at delimiter
    const terms = query.split(separator).filter(Boolean);
    if (!terms.length) return options;
    // Sort terms by descending size
    terms.sort( (a,b) => b.length - a.length );
    
    // create Map keyed by term with count info
    const termMap = new Map(terms.map( (term, termId) => [term, { termId, count: 0, leftOver: 0, size: term.length }] ));
    terms.forEach( (term) => termMap.get(term).count++ );
    
    function getNonOverlaps(offsets, leftOver, lastIndex = 0, offsetIndex = 0) {
        // All terms found?
        if (!leftOver) return [];
        let giveUpAt = Infinity;
        // While still enough matches left over:
        while (offsetIndex + leftOver <= offsets.length) {
            const { termId, offset } = offsets[offsetIndex++];
            if (offset < lastIndex) continue; // overlap, try next
            if (offset >= giveUpAt) break; // Looking further makes no sense
            const termInfo = termMap.get(terms[termId]);
            //console.log('termId', termId, 'offset', offset, 'size', termInfo.size, 'lastIndex', lastIndex);
            if (!termInfo.leftOver) continue; // too many of the same term, try next
            termInfo.leftOver--;
            const result = getNonOverlaps(offsets, leftOver - 1, offset + termInfo.size, offsetIndex);
            // If success, then completely backtrack out of recursion.
            if (result) return result.concat([offset + termInfo.size, offset]);
            termInfo.leftOver++; // restore after failed recursive search and try next
            // If a term-match at a given offset could not lead to a solution (in recursion), 
            // and if we keep those matched character postions all unmatched and only start matching after
            // the end of that location, it will certainly not lead to a solution either.
            giveUpAt = Math.min(giveUpAt, offset + termInfo.size);
        }
    }

    let allTermsAllOptionsOffsets;
    // Loop through the unique terms:
    for (let [term, termInfo] of termMap) {
        // Get the offsets of the matches of this term in all options (in the preprocessed tree)
        const thisTermAllOptionsOffsets = suffixTree.findAll(term, termInfo.termId);
        //console.log('findAll:', JSON.stringify(Array.from(thisTermAllOptionsOffsets)));
        if (!thisTermAllOptionsOffsets.size) return []; // No option has this term, so bail out
        if (!allTermsAllOptionsOffsets) {
            allTermsAllOptionsOffsets = thisTermAllOptionsOffsets;
        } else {
            // Merge with all previously found offsets for other terms (intersection)
            for (let [optionId, offsets] of allTermsAllOptionsOffsets) {
                let newOffsets = thisTermAllOptionsOffsets.get(optionId);
                if (!newOffsets || newOffsets.length < termInfo.count) {
                    // this option does not have enough occurrences of this term
                    allTermsAllOptionsOffsets.delete(optionId); 
                } else {
                    allTermsAllOptionsOffsets.set(optionId, offsets.concat(newOffsets));
                }
            }
            if (!allTermsAllOptionsOffsets.size) return []; // No option has all terms, so bail out
        }
    }
    // Per option, see if (and where) the offsets can serve non-overlapping matches for each term
    const matches = Array.from(allTermsAllOptionsOffsets, ([optionId, offsets]) => {
            // Indicate how many of each term must (still) be matched:
            termMap.forEach( obj => obj.leftOver = obj.count );
            return [optionId, getNonOverlaps(offsets.sort( (a, b) => a.offset - b.offset ), terms.length)];
        })
        // Remove options that could not provide non-overlapping offsets
        .filter( ([_, offsets]) => offsets ) 
        // Sort the remaining options in their original order
        .sort( (a,b) => a[0] - b[1] )
        // Replace optionId, by the corresponding text and apply mark-up at the offsets
        .map( ([optionId, offsets]) => {
            let option = options[optionId];
            offsets.map((index, i) => {
                option = option.substr(0, index) 
                    + (i%2 ? "<u>" : "</u>")
                    + option.substr(index);
            });
            return option;            
        });
    //console.log(JSON.stringify(matches));
    return matches;
}

function trincotPreprocess(options) {
    const nikkonen = new Nikkonen();
    // Add all the options (lowercased) to the suffic tree
    options.map(option => option.toLowerCase()).forEach(nikkonen.addText.bind(nikkonen));
    return nikkonen;
}

const options = ['abbbba', 'United States', 'United Kingdom', 'Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia, Plurinational State of', 'Bonaire, Sint Eustatius and Saba', 'Bosnia and Herzegovina', 'Botswana', 'Bouvet Island', 'Brazil', 'British Indian Ocean Territory', 'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Chile', 'China', 'Christmas Island', 'Cocos (Keeling) Islands', 'Colombia', 'Comoros', 'Congo', 'Congo, The Democratic Republic of The', 'Cook Islands', 'Costa Rica', 'Cote D\'ivoire', 'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Falkland Islands (Malvinas)', 'Faroe Islands', 'Fiji', 'Finland', 'France', 'French Guiana', 'French Polynesia', 'French Southern Territories', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Gibraltar', 'Greece', 'Greenland', 'Grenada', 'Guadeloupe', 'Guam', 'Guatemala', 'Guernsey', 'Guinea', 'Guinea-bissau', 'Guyana', 'Haiti', 'Heard Island and Mcdonald Islands', 'Holy See (Vatican City State)', 'Honduras', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran, Islamic Republic of', 'Iraq', 'Ireland', 'Isle of Man', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jersey', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, Democratic People\'s Republic of', 'Korea, Republic of', 'Kuwait', 'Kyrgyzstan', 'Lao People\'s Democratic Republic', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macao', 'Macedonia, The Former Yugoslav Republic of', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Martinique', 'Mauritania', 'Mauritius', 'Mayotte', 'Mexico', 'Micronesia, Federated States of', 'Moldova, Republic of', 'Monaco', 'Mongolia', 'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria', 'Niue', 'Norfolk Island', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Palestinian Territory, Occupied', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Pitcairn', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar', 'Reunion', 'Romania', 'Russian Federation', 'Rwanda', 'Saint Barthelemy', 'Saint Helena, Ascension and Tristan da Cunha', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Martin (French part)', 'Saint Pierre and Miquelon', 'Saint Vincent and The Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Georgia and The South Sandwich Islands', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Svalbard and Jan Mayen', 'Swaziland', 'Sweden', 'Switzerland', 'Syrian Arab Republic', 'Taiwan, Province of China', 'Tajikistan', 'Tanzania, United Republic of', 'Thailand', 'Timor-leste', 'Togo', 'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'United States Minor Outlying Islands', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Venezuela, Bolivarian Republic of', 'Viet Nam', 'Virgin Islands, British', 'Virgin Islands, U.S.', 'Wallis and Futuna', 'Western Sahara', 'Yemen', 'Zambia', 'Zimbabwe'];

/*
 * I/O and performance measurements 
 */

let preprocessed;
 
function processInput() {
    if (!preprocessed) { // Only first time
        const t0 = performance.now();
        preprocessed = trincotPreprocess(options);
        const spentTime = performance.now() - t0;
        // Output the time spent on preprocessing
        pretime.textContent = spentTime.toFixed(2);
    }
    var query = this.value.toLowerCase();
    const t0 = performance.now();
    const matches = trincotSuffixTree(query, options, preprocessed, ' ');
    const spentTime = performance.now() - t0;
    // Output the time spent
    time.textContent = spentTime.toFixed(2);
    // Output the matches
    result.innerHTML = '';
    for (var match of matches) {
        // Append it to the result list
        var li = document.createElement('li');
        li.innerHTML = match;
        result.appendChild(li);
    }
}

findTerms.addEventListener('keyup', processInput);
processInput.call(findTerms);
&#13;
ul { 
    height:300px;
    font-size: smaller;
    overflow: auto;
}
&#13;
Input terms: <input type="text" id="findTerms"><br>

<h3>Trincot's Suffix Tree Search</h3>
Preprocessing Time: <span id="pretime"></span>ms (only done once)<br>
Time: <span id="time"></span>ms<br>
<ul id="result"></ul>
&#13;
&#13;
&#13;

这个方法背后有相当多的代码,所以我认为它可能不会显示小数据集的有趣性能,而对于较大的数据集,它将消耗内存:树占用的内存比原始选项数组多得多。 / p>

答案 4 :(得分:0)

更新2

由于在Vue中恢复工作字符串的问题,放弃了缩小集合的概念。

现在,方法简单如下:

  1. 预处理选项集以使显示与工作保持同步。
  2. 处理条款。
  3. 通过迭代来减少(过滤)选项集,并循环使用术语,在不匹配时进行短路。
  4. 使用简化集,迭代每个选项,找到匹配范围。
  5. 在每个匹配范围周围插入HTML字符串。
  6. 代码已注释。

    原始javascript(记录已过滤/操作的选项数组):https://jsfiddle.net/pvLj9uxe/14/

    新Vue实施:https://jsfiddle.net/15prcpxn/30/

    计算似乎相当快 - DOM更新就是杀死它。

    已添加至比较*:https://jsfiddle.net/ektyx133/4/

    *警告:预处理选项(视为“静态”)是策略的一部分,因此它已在基准之外进行处理。

    var separator = /\s|\*|,/;
    
    // this function enhances the raw options array 
    function enhanceOptions(options) {
      return options.map(option => ({
        working: option.toLowerCase(), // for use in filtering the set and matching
        display: option // for displaying
      }))
    }
    
    // this function changes the input to lower case, splits the input into terms, removes empty strings from the array, and enhances the terms with the size and wiping string
    function processInput(input) {
      return input.trim().toLowerCase().split(separator).filter(term => term.length).map(term => ({
        value: term.toLowerCase(),
        size: term.length,
        wipe: " ".repeat(term.length)
      })).sort((a, b) => b.size - a.size);
    }
    
    // this function filters the data set, then finds the match ranges, and finally returns an array with HTML tags inserted
    function filterAndHighlight(terms, enhancedOptions) {
      let options = enhancedOptions,
        l = terms.length;
    
      // filter the options - consider recursion instead
      options = options.filter(option => {
        let i = 0,
          working = option.working,
          term;
        while (i < l) {
          if (!~working.indexOf((term = terms[i]).value)) return false;
          working = working.replace(term.value, term.wipe);
          i++;
        }
        return true;
      })
    
      // generate the display string array
      let displayOptions = options.map(option => {
        let rangeSet = [],
          working = option.working,
          display = option.display;
    
        // find the match ranges
        terms.forEach(term => {
          working = working.replace(term.value, (match, offset) => { // duplicate the wipe string replacement from the filter, but grab the offsets
            rangeSet.push({
              start: offset,
              end: offset + term.size
            });
            return term.wipe;
          })
        })
    
        // sort the match ranges, last to first
        rangeSet.sort((a, b) => b.start - a.start);
    
        // insert the html tags within the string around each match range
        rangeSet.forEach(range => {
          display = display.slice(0, range.start) + '<u>' + display.slice(range.start, range.end) + '</u>' + display.slice(range.end)
        })
    
        return display;
    
      })
    
      return displayOptions;
    }
    

    Old Attempt

    https://jsfiddle.net/15prcpxn/25/

    我的尝试,使用Vue进行渲染(这些方法是顺序的,所以你可以毫不费力地将它全部放入一个单片函数中 - 输入将是术语,完整选项集;输出将被过滤选项集,并突出显示范围)。

    1. 将输入拆分为单独的术语
    2. 按长度对字词进行排序(首先是最长字词,因此当您拥有"abc ab"等选项和"a abc"这两个字词时,即术语是另一个字词的子字符串时,它就能够匹配"abc"
    3. 将条款复制/更改为小写
    4. 将选项(我们的“显示集”)复制到小写(我们的“工作集”)
    5. 对于每个术语,删除没有来自“工作集”的匹配的工作选项,并且并行地从“显示集”中删除显示选项 - 这样做时,从幸存的工作选项字符串中删除匹配的术语字符串,例如,选项"a"中匹配的术语"abc"会产生"bc" [实际实现是相反的:对于每个术语,在匹配时重新创建“工作集”,并行添加显示选项到“显示集”,然后将这些集传递给下一个术语] - 这为我们提供了过滤的显示集
    6. 将过滤后的显示集复制为小写,为我们提供新的过滤工作集
    7. 对于剩余过滤工作集中的每个工作选项,通过记录范围(即开始和结束,例如在选项"a"中匹配术语"abc")创建范围集:{{1}通过获取匹配的偏移量(开始)和术语/匹配的长度,每个术语匹配。将匹配的字符串替换为与该术语长度相等的空格(或其他未使用的字符),并将其提供给下一个术语,例如选项start = 0, end = 1中匹配的术语"a"会产生"abc" - 这会保留工作选项的长度,确保过滤的工作集(小写)与过滤的显示集(原始案例)匹配。范围集的总数将等于已过滤选项集中剩余选项的数量。
    8. 此外,对每个范围集内的范围进行排序(按降序排列,以反向工作),以允许字符串插入。
    9. 对于已过滤显示集中的每个选项,(反向工作以便在操作字符串时不会干扰索引),通过切片匹配范围,在匹配范围周围插入" bc" <u>标记显示选项,例如选项</u>中的匹配字词"a""abc"
    10. 渲染
    11. 当有许多匹配/无用的术语时(例如,当您输入单个字符时),性能很差。对于最终用途,我可能会输入计算延迟。

      我应该能够将其中一些步骤汇总到更少的步骤中,从而提高性能。我明天再来。

      Vue大概也会通过虚拟DOM等来处理一些优化,所以它不一定能反映出vanilla Javascript / DOM渲染。