Question

我是JavaScript中正则表达式的新手，我很难从文本字符串中获取匹配数组，如下所示：

Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat

我想得到一系列像这样的比赛：

match[0] = [
    'foo',
    'bar'
]
match[1] = [
    'baz',
    'bat'
]

总而言之，我正在寻找的是：

＆＃34; 任何破折号+字（-foo，-bar等） AFTER 一个句子＆＃34;

任何人都可以提供捕获所有迭代而不是最后一个迭代的公式，因为重复捕获组只会捕获最后一次迭代。如果这是一个愚蠢的问题，请原谅我。如果有人想给我发一些测试，我会使用regex101

Answer 1

我想出的第一个正则表达式如下：

/([^-]+)(-\w*)/g

第一组([^-]+)抓住一切不是破折号的东西。然后，我们按照我们想要的实际捕获组(-\w+)进行操作。我们添加标志g以使正则表达式对象跟踪它看起来的最后位置。这意味着，每次运行regex.exec(search)时，我们都会获得您在regex101中看到的下一场比赛。

注意： JavaScript 的\w等同于[a-zA-Z0-9_]。因此，如果您只是想要使用此字母而不是\w：[a-zA-Z]

以下是实现此正则表达式的代码。

<p id = "input">
    Sentence would go here
    -foo
    -bar
    Another sentence would go here
    -baz
    -bat
</p>

<p id = "output">

</p>

<script>
    // Needed in order to make sure did not get a sentence.
    function check_for_word(search) {return search.split(/\w/).length > 1}
    function capture(regex, search) {
        var 
        // The initial match.
            match  = regex.exec(search),
        // Stores all of the results from the search.
            result = [],
        // Used to gather results.
            gather;
        while(match) {
            // Create something empty.
            gather = [];
            // Push onto the gather.
            gather.push(match[2]);
            // Get the next match.
            match = regex.exec(search);
            // While we have more dashes...
            while(match && !check_for_word(match[1])) {
                // Push result on!
                gather.push(match[2]);
                // Get the next match to be checked.
                match = regex.exec(search);
            };
            // Push what was gathered onto the result.
            result.push(gather);
        }
        // Hand back the result.
        return result;
    };
    var output = capture(/([^-]+)(-\w+)/g, document.getElementById("input").innerHTML);
    document.getElementById("output").innerHTML = JSON.stringify(output);
</script>

使用略微修改的正则表达式，您可能会得到更多您正在寻找的内容。

/[^-]+((?:-\w+[^-\w]*)+)/g

[^-\w]*的额外位允许每个破折号字之间存在某种分隔。然后添加非捕获组(?:)以允许+一个或多个破折号。我们也不需要()周围的[^-]+，因为您将在下面看到不再需要的数据。第一个是关于什么可以在破折号之间打破更灵活，但我发现这个更干净。

function capture(regex, search) {
    var 
	// The initial match.
	    match  = regex.exec(search),
	// Stores all of the results from the search.
	    result = [],
	// Used to gather results.
		gather;
	while(match) {
	    // Create something empty.
	    gather = [];
		
	    // Break up the large match.
	    var temp = match[1].split('-');
		for(var i in temp) 
		{
		    temp[i] = temp[i].split(/\W*/).join("");
			// Makes sure there was actually something to gather.
		    if(temp[i].length > 0)
		        gather.push("-" + temp[i]);
		}
		
		// Push what was gathered onto the result.
		result.push(gather);
		
		// Get the next match.
		match = regex.exec(search);	
	};
	// Hand back the result.
	return result;
};
var output = capture(/[^-]+((?:-\w+[^-\w]*)+)/g, document.getElementById("input").innerHTML);
document.getElementById("output").innerHTML = JSON.stringify(output);

<p id = "input">
Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat
My very own sentence!
-get
-all
-of
  -these!
</p>

<p id = "output">

</p>

Answer 2

Regexp捕获对于无限数量的群组并不能很好地发挥作用。相反，分裂在这里更好用：

＆＃13;

var text = document.getElementById('text').textContent;
var blocks = text.split(/^(?!-)/m);
var result = blocks.map(function(block) {
  return block.split(/^-/m).slice(1).map(function(line) {
      return line.trim();
    });
});
document.getElementById('text').textContent = JSON.stringify(result);

＆＃13;

<div id="text">Sentence would go here
-foo
-bar
Another sentence would go here
-baz
-bat
</div>

＆＃13;

Answer 3

只需匹配以-开头的两行，如果足够，则以换行符开头。

\n-(.*)\r?\n-(.*)

见regex demo at regex101。要获得匹配，请使用exec() method。

var re = /\n-(.*)\r?\n-(.*)/g; var m;

var str = 'Sentence would go here\n-foo\n-bar\nAnother sentence would go here\n-baz\n-bat';

while ((m = re.exec(str)) !== null) {
  if (m.index === re.lastIndex) re.lastIndex++;
  document.write(m[1] + ',' + m[2] + '<br>');
}

JavaScript正则表达式和捕获组

3 个答案: