我有一个相当庞大的术语列表 - 大约6000个。

关于那个问题的accepted answer非常棒,但从未使用过XPath,当问题开始出现时,我感到很茫然。有一次,在摆弄代码之后,我设法在我们的数据库中添加了超过40,000个随机字符 - 其中大部分需要手动删除。从那以后,我对这个想法失去了信心,更简单的PHP解决方案根本没有足够的效率来处理数据量和术语数量。


This answer有一个我想尝试的想法。


var words = [
        word: 'Something',
        link: 'http://www.something.com'
        word: 'Something Else',
        link: 'http://www.something.com/else'


//for each array element
    function() {
        //store it ("this" is gonna become the dom element in the next function)
        var search = this;
            function() {
                //if it's exactly the same
                if ($(this).text() === search.word) {
                    //do your magic tricks
                    $(this).html('<a href="' + search.link + '">' + search.link + '</a>');





  • 我可以做这个吗?
  • 我可以采取哪些措施使其尽可能高效?


Table: Posts
id        post
102       "Google is a search engine"


Table: cached_Posts
id       post_id   date_generated   cached_post                             
1        102       2012-10-10       <a href="http://google.com">Google</a> is a search engine"




  1. jQuery的$.each()函数虽然非常有用,但效率不高。尝试运行此基准测试,您将看到我的意思:http://jsperf.com/jquery-each-vs-for-loops/9

  2. 如果您要在循环的每次迭代中运行$('.message'),那么您可能会进行大量相当昂贵的DOM遍历。如果可能,您应该在开始循环words

  3. 之前将此操作的结果缓存在变量中
  4. 您是否依赖于“搜索”文本的每个实例,这些实例是由具有类message的任何元素封装而且没有其他文本围绕它?因为那是if ($(this).text() === search.word) {行所暗示的。在您的其他问题中,您似乎建议您有更多关于要替换的术语的文本,在这种情况下,您可能需要查看正则表达式来执行替换。您还需要确保文本未包含在<a>标记中。

我想出的是相对简单的东西。对不起,没有彻底的测试,也没有性能测试。我保证它可以进一步优化,我只是没有时间去做。我提出了一些评论,以使其更简单http://pastebin.com/nkdTSvi6 StackOverflow可能有点长,但无论如何我都会在这里发布。为了更舒适的观看,可以使用pastebin。

function buildTrie(hash) {
    "use strict";
    // A very simple function to build a Trie
    // we could compress this later, but simplicity
    // is better for this example. If we don't
    // perform well, we'll try to optimize this a bit
    // there is a room for optimization here.
    var p, result = {}, leaf, i;
    for (p in hash) {
        if (hash.hasOwnProperty(p)) {
            leaf = result;
            i = 0;
            do {
                if (p[i] in leaf) {
                    leaf = leaf[p[i]];
                } else {
                    leaf = leaf[p[i]] = {};
                i += 1;
            } while (i < p.length);
            // since, obviously, no character
            // equals to empty character, we'll
            // use it to store the reference to the
            // original value
            leaf[""] = hash[p];
    return result;

function prefixReplaceHtml(html, trie) {
    "use strict";
    var i, len = html.length, result = [], lastMatch = 0,
        current, leaf, match, matched, replacement;
    for (i = 0; i < len; i += 1) {
        current = html[i];
        if (current === "<") {
            // don't check for out of bounds access
            // assume we never face a situation, when
            // "<" is the last character in an HTML
            if (match) {
                    html.substring(lastMatch, i - matched.length),
                    "<a href=\"", match, "\">", replacement, "</a>");
                lastMatch = i - matched.length + replacement.length;
                i = lastMatch - 1;
            } else {
                if (matched) {
                    // go back to the second character of the
                    // matched string and try again
                    i = i - matched.length;
            matched = match = replacement = leaf = "";
            if (html[i + 1] === "a") {
                // we want to skip replacing inside
                // anchor tags. We also assume they
                // are never nested, as valid HTML is
                // against that idea
                if (html[i + 2] in
                    { " " : 1, "\t" : 1, "\r" : 1, "\n" : 1 }) {
                    // this is certainly an anchor
                    i = html.indexOf("</a", i + 3) + 3;
            // if we got here, it's a regular tag, just look
            // for terminating ">"
            i = html.indexOf(">", i + 1);
        // if we got here, we need to start checking
        // for the match in the trie
        if (!leaf) {
            leaf = trie;
        leaf = leaf[current];
        // we prefer longest possible match, just like POSIX
        // regular expressions do
        if (leaf && ("" in leaf)) {
            match = leaf[""];
            replacement = html.substring(
                i - (matched ? matched.length : 0), i + 1);
        if (!leaf) {
            // newby-style inline (all hand work!) pay extra
            // attention, this code is duplicated few lines above
            if (match) {
                    html.substring(lastMatch, i - matched.length),
                    "<a href=\"", match, "\">", replacement, "</a>");
                lastMatch = i - matched.length + replacement.length;
                i = lastMatch - 1;
            } else {
                if (matched) {
                    // go back to the second character of the
                    // matched string and try again
                    i = i - matched.length;
            matched = match = replacement = "";
        } else if (matched) {
            // perhaps a bit premature, but we'll try to avoid
            // string concatenation, when we can.
            matched = html.substring(i - matched.length, i + 1);
        } else {
            matched = current;
    return result.join("");

function testPrefixReplace() {
    "use strict";
    var trie = buildTrie(
        { "x" : "www.xxx.com", "yyy" : "www.y.com",
          "xy" : "www.xy.com", "yy" : "www.why.com" });
    return prefixReplaceHtml(
        "<html><head>x</head><body><a >yyy</a><p>" +
            "xyyy yy x xy</p><abrval><yy>xxy</yy>", trie);

步骤1,抛弃AJAX要求。 Ajax用于与服务器交互,向服务器提交少量数据并获得响应。不适合你想要的东西。



  • 加载消息
  • 加载“dictionnary”
  • 循环显示词典中的每个单词
    • 在DOM中寻找匹配(哎哟)
      • 替换



  • 服务器更适合这些类型的工作
  • JS在客户端浏览器上运行。每个客户都是不同的(例如:有人可能会使用性能较差的IE,或者有人使用智能手机)


    $dict[] = array('word' => 'dolor', 'link' => 'DOLORRRRRR');
    $dict[] = array('word' => 'nulla', 'link' => 'NULLAAAARRRR');

    //  Pretty sure there's a more efficient way to separate an array.. my PHP is rusty, sorry. 
    $terms = array();
    $replace = array();
    foreach ($dict as $v) {
        // If you want to make sure it's a complete word, add a space to the term. 
        $terms[] = ' ' . $v['word'] . ' ';
        $replace[] = ' '. $v['link'] . ' ';

    $text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    echo str_replace($terms, $replace, $text);

    /* Output: 
    Lorem ipsum DOLORRRRRR sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure DOLORRRRRR in reprehenderit in voluptate velit esse cillum dolore eu fugiat NULLAAAARRRR pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.


虽然这个脚本非常基本 - 它不会接受不同的情况。

