需要C#Regex才能在句子中获得单词对

时间:2011-07-14 14:59:14

标签: c# regex

是否有正则表达式会采用以下句子:

“我想把它分成两对”

并生成以下列表:

“我想要”, “想要这个”, “这种分裂”, “分开”, “进入”, “成对”

4 个答案:

答案 0 :(得分:5)

由于需要重复使用单词,因此需要先行断言:

Regex regexObj = new Regex(
    @"(     # Match and capture in backreference no. 1:
     \w+    # one or more alphanumeric characters
     \s+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (\w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
    matchResult = matchResult.NextMatch();
}

对于三人组:

Regex regexObj = new Regex(
    @"(     # Match and capture in backreference no. 1:
     \w+    # one or more alphanumeric characters
     \s+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (      # and capture...
      \w+   # another word,
      \s+   # whitespace,
      \w+   # word.
     )      # End of capturing group 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);

答案 1 :(得分:4)

你可以做到

var myWords = myString.Split(' ');

var myPairs = myWords.Take(myWords.Length - 1)
    .Select((w, i) => w + " " + myWords[i + 1]);

答案 2 :(得分:3)

您可以使用string.Split()并合并结果:

var words = myString.Split(new char[] { ' ' });
var pairs = new List<string>();

for (int i = 0; i < words.Length - 1; i++)
{
    pairs.Add(words[i] + words[i+1]);
}

答案 3 :(得分:0)

要仅使用RegEx并且不进行后期处理,我们可以重复使用Tim Pietzcker的答案,但是连续两次通过RegEx

我们可以从Tim Pietzcker的答案中传递原文,并且同样具有后视图,这将使正则表达式从第二个单词开始捕获。

如果您合并两个RegEx的结果,您将获得文本中的所有对。

Regex regexObj1 = new Regex(
    @"(     # Match and capture in backreference no. 1:
     \w+    # one or more alphanumeric characters
     \s+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (\w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value + matchResult.Groups[2].Value);
    matchResult = matchResult.NextMatch();
}

Regex regexObj2 = new Regex(
    @"(?<=  # Assert that there preceds and will not be captured
     \w+\s+ # the first word followed by any space
    )
    (     # Match and capture in backreference no. 1:
     \w+    # one or more alphanumeric characters
     \s+    # one or more whitespace characters.
    )       # End of capturing group 1.
    (?=     # Assert that there follows...
     (\w+)  # another word; capture that into backref 2.
    )       # End of lookahead.", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResult1 = regexObj1.Match(subjectString);
Match matchResult2 = regexObj2.Match(subjectString);

对于三人组:

您需要在程序中添加第三个RegEx:

Regex regexObj3 = new Regex(
        @"(?<=  # Assert that there preceds and will not be captured
         \w+\s+\w+\s+ # the first and second word followed by any space
        )
        (     # Match and capture in backreference no. 1:
         \w+    # one or more alphanumeric characters
         \s+    # one or more whitespace characters.
        )       # End of capturing group 1.
        (?=     # Assert that there follows...
         (\w+)  # another word; capture that into backref 2.
        )       # End of lookahead.", 
        RegexOptions.IgnorePatternWhitespace);
    Match matchResult1 = regexObj1.Match(subjectString);
    Match matchResult2 = regexObj2.Match(subjectString);
    Match matchResult3 = regexObj3.Match(subjectString);