JavaScript正则表达式和捕获组

时间:2015-11-10 01:18:26

标签: javascript regex

我是JavaScript中正则表达式的新手,我很难从文本字符串中获取匹配数组,如下所示:

Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat

我想得到一系列像这样的比赛:

match[0] = [
    'foo',
    'bar'
]
match[1] = [
    'baz',
    'bat'
]

总而言之,我正在寻找的是:

" 任何破折号+字(-foo,-bar等) AFTER 一个句子"

任何人都可以提供捕获所有迭代而不是最后一个迭代的公式,因为重复捕获组只会捕获最后一次迭代。如果这是一个愚蠢的问题,请原谅我。如果有人想给我发一些测试,我会使用regex101

3 个答案:

答案 0 :(得分:2)

我想出的第一个正则表达式如下:

/([^-]+)(-\w*)/g

第一组([^-]+)抓住一切不是破折号的东西。然后,我们按照我们想要的实际捕获组(-\w+)进行操作。我们添加标志g以使正则表达式对象跟踪它看起来的最后位置。这意味着,每次运行regex.exec(search)时,我们都会获得您在regex101中看到的下一场比赛。

注意: JavaScript \w等同于[a-zA-Z0-9_]。因此,如果您只是想要使用此字母而不是\w[a-zA-Z]

以下是实现此正则表达式的代码。

<p id = "input">
    Sentence would go here
    -foo
    -bar
    Another sentence would go here
    -baz
    -bat
</p>

<p id = "output">

</p>

<script>
    // Needed in order to make sure did not get a sentence.
    function check_for_word(search) {return search.split(/\w/).length > 1}
    function capture(regex, search) {
        var 
        // The initial match.
            match  = regex.exec(search),
        // Stores all of the results from the search.
            result = [],
        // Used to gather results.
            gather;
        while(match) {
            // Create something empty.
            gather = [];
            // Push onto the gather.
            gather.push(match[2]);
            // Get the next match.
            match = regex.exec(search);
            // While we have more dashes...
            while(match && !check_for_word(match[1])) {
                // Push result on!
                gather.push(match[2]);
                // Get the next match to be checked.
                match = regex.exec(search);
            };
            // Push what was gathered onto the result.
            result.push(gather);
        }
        // Hand back the result.
        return result;
    };
    var output = capture(/([^-]+)(-\w+)/g, document.getElementById("input").innerHTML);
    document.getElementById("output").innerHTML = JSON.stringify(output);
</script>

使用略微修改的正则表达式,您可能会得到更多您正在寻找的内容。

/[^-]+((?:-\w+[^-\w]*)+)/g

[^-\w]*的额外位允许每个破折号字之间存在某种分隔。然后添加非捕获组(?:)以允许+一个或多个破折号。我们也不需要()周围的[^-]+,因为您将在下面看到不再需要的数据。第一个是关于什么可以在破折号之间打破更灵活,但我发现这个更干净。

function capture(regex, search) {
    var 
	// The initial match.
	    match  = regex.exec(search),
	// Stores all of the results from the search.
	    result = [],
	// Used to gather results.
		gather;
	while(match) {
	    // Create something empty.
	    gather = [];
		
	    // Break up the large match.
	    var temp = match[1].split('-');
		for(var i in temp) 
		{
		    temp[i] = temp[i].split(/\W*/).join("");
			// Makes sure there was actually something to gather.
		    if(temp[i].length > 0)
		        gather.push("-" + temp[i]);
		}
		
		// Push what was gathered onto the result.
		result.push(gather);
		
		// Get the next match.
		match = regex.exec(search);	
	};
	// Hand back the result.
	return result;
};
var output = capture(/[^-]+((?:-\w+[^-\w]*)+)/g, document.getElementById("input").innerHTML);
document.getElementById("output").innerHTML = JSON.stringify(output);
<p id = "input">
Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat
My very own sentence!
-get
-all
-of
  -these!
</p>

<p id = "output">

</p>

答案 1 :(得分:1)

Regexp捕获对于无限数量的群组并不能很好地发挥作用。相反,分裂在这里更好用:

&#13;
&#13;
var text = document.getElementById('text').textContent;
var blocks = text.split(/^(?!-)/m);
var result = blocks.map(function(block) {
  return block.split(/^-/m).slice(1).map(function(line) {
      return line.trim();
    });
});
document.getElementById('text').textContent = JSON.stringify(result);
&#13;
<div id="text">Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat
</div>
&#13;
&#13;
&#13;

答案 2 :(得分:1)

只需匹配以-开头的两行,如果足够,则以换行符开头。

\n-(.*)\r?\n-(.*)

regex demo at regex101。要获得匹配,请使用exec() method

var re = /\n-(.*)\r?\n-(.*)/g; var m;

var str = 'Sentence would go here\n-foo\n-bar\nAnother sentence would go here\n-baz\n-bat';

while ((m = re.exec(str)) !== null) {
  if (m.index === re.lastIndex) re.lastIndex++;
  document.write(m[1] + ',' + m[2] + '<br>');
}