正则表达式匹配标点符号空间但保留标点符号

时间:2016-04-29 01:52:51

标签: javascript arrays regex string

我有一个大段落字符串,我试图使用JavaScript的.split()方法将其拆分成句子。我需要一个匹配句点或问号[?.]后跟空格的正则表达式。但是,我需要在结果数组中保留句点/问号。如果没有JS的正面观察,我怎么能这样做?

编辑:示例输入: "This is sentence 1. This is sentence 2? This is sentence 3." 示例输出: ["This is sentence 1.", "This is sentence 2?", "This is sentence 3."]

5 个答案:

答案 0 :(得分:1)

忘掉split()。你想要match()

var text = "This is an example paragragh. Oh and it has a question? Ok it's followed by some other random stuff. Bye.";

var matches = text.match(/[\w\s'\";\(\)\,]+(\.|\?)(\s|$)/g);


alert(matches);

生成的匹配数组包含每个句子:

    Array[4]
        0:"This is an example paragragh. "
        1:"Oh and it has a question? "
        2:"Ok it's followed by some other random stuff. "
        4:"Bye. "

以下是进一步测试的小提琴:https://jsfiddle.net/uds4cww3/

也编辑为匹配行尾。

答案 1 :(得分:1)

这个正则表达式将起作用

([^?.]+[?.])(?:\s|$)

<强> Regex Demo

JS Demo

<强> Ideone Demo

&#13;
&#13;
var str = 'This is sentence 1. This is sentence 2? This is sentence 3.';
var regex = /([^?.]+[?.])(?:\s|$)/gm;
var m;

while ((m = regex.exec(str)) !== null) {
    document.writeln(m[1] + '<br>');
}
&#13;
&#13;
&#13;

答案 2 :(得分:0)

可能是这个验证你的数组项目

\b.*?[?\.](?=\s|$)

Regular expression visualization

Debuggex Demo

答案 3 :(得分:0)

这很俗气,但确实有效:

var breakIntoSentences = function(s) {
  var l = [];
  s.replace(/[^.?]+.?/g, a => l.push(a));
  return l;
}

breakIntoSentences("how? who cares.")
["how?", " who cares."]

(真的是它如何工作:RE匹配一串非标点符号,然后是某些东西。由于匹配是贪婪的,所以某些东西是标点符号或字符串结尾。)

这只会捕获一系列标点符号中的第一个,因此breakIntoSentences("how???? who cares...")也会返回["how?", " who cares."]。如果要捕获所有标点符号,请改为使用/[^.?]+[.?]*/g作为RE。

编辑:哈哈哈:Wavvves教我match(),这就是替换/推送的作用。你知道每个该死的日子都知道的事情。

以最小的形式,支持三个标点符号,并使用ES6语法,我们得到:

const breakIntoSentences = s => s.match(/[^.?,]+[.?,]*/g)

答案 4 :(得分:0)

我猜.match会这样做:

(?:\s?)(.*?[.?])

即:

sentence = "This is sentence 1. This is sentence 2? This is sentence 3.";
result = sentence.match(/(?:\s?)(.*?[.?])/ig);
for (var i = 0; i < result.length; i++) {
   document.write(result[i]+"<br>");
}