Question

基本上我想迭代所有句子，例如：

string sentence = "How was your day - Andrew, Jane?";
string[] separated = SeparateSentence(sentence);

separated输出如下：

[1] =“如何”

[2] =“”

[3] =“是”

[4] =“”

[5] =“你的”

[6] =“”

[7] =“day”

[8] =“”

[9] =“ - ”

[10] =“”

[11] =“安德鲁”

[12] =“，”

[13] =“”

[14] =“简”

[15] =“？”

截至目前，我只能使用"\w(?<!\d)[\w'-]*"正则表达式获取文字。如何根据输出示例将句子分成更小的部分？

编辑：该字符串没有以下任何内容：

即
固体形式的
8th，1st，2nd

Answer 1

检查出来：

        string pattern = @"^(\s+|\d+|\w+|[^\d\s\w])+$";
        string input = "How was your 7 day - Andrew, Jane?";

        List<string> words = new List<string>();

        Regex regex = new Regex(pattern);

        if (regex.IsMatch(input))
        {
            Match match = regex.Match(input);

            foreach (Capture capture in match.Groups[1].Captures)
                words.Add(capture.Value);
        }

Answer 2

我建议你实现一个简单的词法分析器（如果存在这样的东西），它将一次读取一个字符并生成你正在寻找的输出。虽然不是最简单的解决方案，但它具有可扩展性的优势，以防您的用例变得更加复杂，如@AndreCalil建议的那样。

Answer 3

为什么不是这样的？它是根据您的测试用例量身定制的，但如果您添加标点符号，则可能是您正在寻找的。

(\w+|[,-?])

编辑：啊，为了窃取安德烈的回应，这就是我想象的：

string pattern = @"(\w+|[,-?])";
string input = "How was your 7 day - Andrew, Jane?";

List<string> words = new List<string>();

Regex regex = new Regex(pattern);

if (regex.IsMatch(input))
{
    MatchCollection matches = regex.Matches(input);

    foreach (Match m in matches)
        words.Add(m.Groups[1].Value);
}

正则表达式：如何从字符串中获取单词，空格和标点符号

3 个答案: