正则表达式匹配句子

时间:2011-04-05 14:20:49

标签: java regex

如何匹配“Hello world”或“Hello World”形式的句子。句子可能包含“ - /数字0-9”。任何信息对我都非常有帮助。谢谢。

2 个答案:

答案 0 :(得分:23)

这个会做得很好。我对句子的定义:一个句子以非空格开头,以句号,感叹号或问号(或字符串结尾)结束。在结束标点符号后可能会有结束语。

[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)

import java.util.regex.*;
public class TEST {
    public static void main(String[] args) {
        String subjectString = 
        "This is a sentence. " +
        "So is \"this\"! And is \"this?\" " +
        "This is 'stackoverflow.com!' " +
        "Hello World";
        String[] sentences = null;
        Pattern re = Pattern.compile(
            "# Match a sentence ending in punctuation or EOS.\n" +
            "[^.!?\\s]    # First char is non-punct, non-ws\n" +
            "[^.!?]*      # Greedily consume up to punctuation.\n" +
            "(?:          # Group for unrolling the loop.\n" +
            "  [.!?]      # (special) inner punctuation ok if\n" +
            "  (?!['\"]?\\s|$)  # not followed by ws or EOS.\n" +
            "  [^.!?]*    # Greedily consume up to punctuation.\n" +
            ")*           # Zero or more (special normal*)\n" +
            "[.!?]?       # Optional ending punctuation.\n" +
            "['\"]?       # Optional closing quote.\n" +
            "(?=\\s|$)", 
            Pattern.MULTILINE | Pattern.COMMENTS);
        Matcher reMatcher = re.matcher(subjectString);
        while (reMatcher.find()) {
            System.out.println(reMatcher.group());
        } 
    }
}

这是输出:

This is a sentence.
So is "this"!
And is "this?"
This is 'stackoverflow.com!'
Hello World

正确匹配所有这些(最后一句没有结束标点符号),结果并不像看起来那么容易!

答案 1 :(得分:0)

如果用句子表示以标点符号结尾的内容,请尝试:(.*?)[.?!]

说明:

  • .*匹配任何字符串。添加?会使其非贪婪匹配(匹配可能的最小字符串)
  • [.?!]匹配三个标点符号中的任何一个