以每个单词为基础开始匹配。要被视为AND运算符的空间

时间:2015-10-08 14:13:18

标签: javascript regex

在庞大的字符串列表中,我需要促进过滤。用户将键入几个字符,这些字符将用于基于每个单词执行startsWith匹配。任何white-space字符都应被视为AND运算符。

我们假设,如果用户输入Ad Ade A,则应该匹配包含以AdAdeA开头的字词的字符串(顺序不重要) )。用户键入的每个单词在字符串中应至少有一个startsWith匹配。

例如1。

Af Ele Ada

会匹配

"Adam likes African Elephants"
"Test Adam Africa Elephant Africa Adam"

但不匹配

Adam likes Australian Elephants (since no word starts with Af)

e.g。 2

Ad Ade A

会匹配

"JunkCharacters Adenine Test1 Adam Test2 Abcd Test3"

但不匹配

"Adam Adam Adam" (since no word is starting with Ade)
"Adenine" (since Ade matches Adenine and there are no matches for Ad and A).

是否可以为此匹配构造正则表达式?如果可能,首选一个正则表达式。

3 个答案:

答案 0 :(得分:3)

使用前瞻,您可以在单个正则表达式中执行此操作:

^(?=.*\b(Ade\w*)\b)(?=.*\b(?!\1)(Ad\w*)\b)(?=.*\b(?!\1|\2)(A\w*)\b).*

RegEx Demo

构建正则表达式的代码:

function lookaheads(n){    var str ="&#34 ;;    for(i = 1; i< = n; i ++)       str + ="(?!。* \" + i +")&#34 ;;    返回str; }

var s = 'Ade Ad A'

c=0;
var re = new RegExp('^' + s.split(/\s+/g).map(function (m) {
   return "(?=.*\\b" + lookaheads(c++) + "(" + m + "\\w*)\\b)";
}).join(''), 'g');

//=> /^(?=.*\b(Ade\w*)\b)(?=.*\b(?!.*\1)(Ad\w*)\b)(?=.*\b(?!.*\1)(?!.*\2)(A\w*)\b)/g

现在测试一下:

re.test("Adenine Adam Ball A")
true

re.test("Adenine")
false

re.test('JunkCharacters Adenine Test1 Adam Test2 Abcd Test3')
true

答案 1 :(得分:1)

为每个标记创建一个正则表达式,并将结果AND组合在一起,如:

string.match(/\bAd/) && string.match(/\bAde/) && string.match(/\bA/)

此外,根据搜索特征,您可以考虑构建单词索引。使用索引 startsWith 可以是非常快速的操作。 O(log n),索引vs O(n)没有。

对指数进行详细说明:

您可以构建反向索引。假设你有文件:

1 "Adam likes African Elephants"
2 "Test Adam Africa Elephant Africa Adam"
3 "Adam likes Australian Elephants"
4 "JunkCharacters Adenine Test1 Adam Test2 Abcd Test3"

您的反向索引就像:

Adam       1 2 3 4
African    1
Elephant   2
Elephants  1 3
likes      1 3
etc..

在该索引中,由于具有startsWith特性,您可以对令牌进行二进制搜索,这使得搜索非常快O(log n)。

构建索引需要时间。因此,如果您的文档发生了很大变化,或者文档相对较少,则可能不值得。

答案 2 :(得分:1)

这样的事情怎么样?

http://jsbin.com/temaso/edit?js,console

var sentances = [
    "Adam likes African Elephants",
    "Test Adam Africa Elephant Africa Adam",
    "JunkCharacters Adenine Test1 Adam Test2 Abcd Test3",
    "Adam Adam Adam",
    "Adenine"
];

var goodMatch = function(searchstring,sentance) {

  // ensure ALL search words match at least ONE sentance word
  return searchstring.split(/\s+/).every(function(searchWord) {
    return sentance.split(/\s+/).some(function(targetWord) {

        // ensure search word has a length, and create regex based on it
        return searchWord.length && new RegExp('^'+searchWord).test(targetWord);
    });
  });
};

var search = function(searchstring) {
    return sentances.filter(function(sentance){
        return goodMatch(searchstring,sentance);
    });
};



console.log = function(input) {
  // hijack console.log() for Stack Overflow code sandbox
  var output = JSON.stringify(input);
  var newli = document.querySelector('#debug li[hidden]').cloneNode();
  newli.removeAttribute('hidden');
  newli.innerHTML = output;
  document.querySelector('#debug').appendChild(newli);
};

var sentances = [
	"Adam likes African Elephants",
	"Test Adam Africa Elephant Africa Adam",
	"JunkCharacters Adenine Test1 Adam Test2 Abcd Test3",
  	"Adam Adam Adam",
  	"Adenine"
];

var goodMatch = function(searchstring,sentance) {
	
  // ensure ALL search words match at least ONE sentance word
  return searchstring.split(/\s+/).every(function(searchWord) {
    return sentance.split(/\s+/).some(function(targetWord) {
		
      // ensure search word has a length, and create regex based on it
      return searchWord.length && new RegExp('^'+searchWord).test(targetWord);
    });
  });
};
                                      
var search = function(searchstring) {
    return sentances.filter(function(sentance){
		return goodMatch(searchstring,sentance);
    });
};

console.log( search("Af Ele Ada") );
console.log( search("Ad Ade A") );

<ol id="debug">
  <li hidden><pre><code></code></pre></li>
</ol>
&#13;
&#13;
&#13;