Question

假设我有两个字符串，可能如下所示

var tester = "hello I have to ask you a doubt";
var case   = "hello better explain me the doubt";

这种情况下，两个字符串都包含hello和doubt等常用字词。所以假设我的默认字符串是tester，我有一个变量case，它包含一组可以是任何东西的单词。我确实想要实现tester和case中存在的常用词数。它应该以对象的形式给我一个结果。

结果

{"hello" : 1, "doubt" : 1};

我目前的实施情况如下

var tester = "hello I have to ask you a doubt";
function getMeRepeatedWordsDetails(case){
    var defaultWords = tester.split(" ");
    var testWords    = case.split(" "), result = {};
    for(var testWord in testWords){
        for(var defaultWord in defaultWords){
            if(defaultWord == testWord){
                result[testWord] = (!result[testWord]) ? 1 : (result[testWord] + 1);  
            }
        }
    }
    return result;
}

我怀疑有正则表达式使这项任务更容易，因为它可以找到模式匹配。但不确定使用Regex可以实现这一点。我需要知道我是否正在遵循正确的道路去做同样的事情。

Answer 1

您可以使用第一个正则表达式作为标记生成器将tester字符串拆分为单词列表，然后使用这些单词构建与单词列表匹配的第二个正则表达式。例如：

var tester = "a string with a lot of words";

function getMeRepeatedWordsDetails ( sentence ) {
  sentence = sentence + " ";
  var regex = /[^\s]+/g;
  var regex2 = new RegExp ( "(" + tester.match ( regex ).join ( "|" ) + ")\\W", "g" );
  matches = sentence.match ( regex2 );
  var words = {};
  for ( var i = 0; i < matches.length; i++ ) {
    var match = matches [ i ].replace ( /\W/g, "" );
    var w = words [ match ];
    if ( ! w )
      words [ match ] = 1;
    else
      words [ match ]++;
  }   
  return words;
} 

console.log ( getMeRepeatedWordsDetails ( "another string with some words" ) );

标记器是行：

var regex = /[^\s]+/g;

当你这样做时：

tester.match ( regex )

您将获得tester中包含的字词列表：

[ "a", "string", "with", "a", "lot", "of", "words" ]

使用这样的数组，我们构建了第二个匹配所有单词的正则表达式; regex2的格式为：

/(a|string|with|a|lot|of|words)\W/g

添加\W以仅匹配整个单词，否则a元素将匹配以a开头的任何单词。将regex2应用于sentence的结果是另一个数组，其中只包含regex2中包含的字词，即tester和{{1}中包含的字词}}。然后，sentence循环仅计算for数组中的单词，将其转换为您请求的对象。

但请注意：

您必须在matches的末尾添加至少一个空格，否则sentence中的\W与最后一个字词不匹配：regex2
您必须从sentence = sentence + " "抓取的匹配项中删除一些可能的额外字符：\W

查找两个字符串值中单词的常见事件

1 个答案: