正则表达式

时间:2015-04-04 11:10:47

标签: c# regex split

在标记句子中按照此顺序分割单词的正则表达式应该是什么

(B-NP)-(B-VP)-(B-NP)-(I-NP) or (B-NP)-(I-NP)-(B-VP)-(B-NP)-(I-NP).

句子例子:

(B-SBAR)After(B-SBAR) (B-NP)Chuck(B-NP) (I-NP)and(I-NP) (I-NP)David(I-NP) (B-VP)leave(B-VP) (B-NP)the(B-NP) (I-NP)gang(I-NP) (O),(O) (B-NP)the(B-NP) (I-NP)remaining(I-NP) (I-NP)group(I-NP) (B-ADVP)also(B-ADVP) (B-VP)split(B-VP) (B-PRT)up(B-PRT) (B-NP)into(B-NP) (I-NP)2(I-NP) (I-NP)groups(I-NP) (B-PP)of(B-PP) (B-NP)2(B-NP) (O)and(O) (B-VP)get(B-VP) (I-VP)to(I-VP) (I-VP)know(I-VP) (B-NP)each(B-NP) (I-NP)other(I-NP) (I-NP)a(I-NP) (I-NP)little(I-NP) (I-NP)better(I-NP) (O).(O)

应该拆分:

  • 查克离开团伙
  • 剩下的分成2
  • 2让对方

1 个答案:

答案 0 :(得分:1)

实际上,您需要使用命名的捕获组(幸运的是,在.NET中,regex支持多个具有相同名称的命名组)。

var str = "(B-SBAR)After(B-SBAR) (B-NP)Chuck(B-NP) (I-NP)and(I-NP) (I-NP)David(I-NP) (B-VP)leave(B-VP) (B-NP)the(B-NP) (I-NP)gang(I-NP) (O),(O) (B-NP)the(B-NP) (I-NP)remaining(I-NP) (I-NP)group(I-NP) (B-ADVP)also(B-ADVP) (B-VP)split(B-VP) (B-PRT)up(B-PRT) (B-NP)into(B-NP) (I-NP)2(I-NP) (I-NP)groups(I-NP) (B-PP)of(B-PP) (B-NP)2(B-NP) (O)and(O) (B-VP)get(B-VP) (I-VP)to(I-VP) (I-VP)know(I-VP) (B-NP)each(B-NP) (I-NP)other(I-NP) (I-NP)a(I-NP) (I-NP)little(I-NP) (I-NP)better(I-NP) (O).(O)";
var rx = new Regex(@"(?<FstTag>\(B-NP\))(?<FstWrd>\w+)\k<FstTag>.*?(?<SndTag>\(B-VP\))(?<SndWrd>\w+)\k<SndTag>.*?(?<TrdTag>\(B-NP\))(?<TrdWrd>\w+)\k<TrdTag>.*?(?<FthTag>\(I-NP\))(?<FthWrd>\w+)\k<FthTag>|(?<FstTag>\(B-NP\))(?<FstWrd>\w+)\k<FstTag>.*?(?<SndTag>\(I-NP\))(?<SndWrd>\w+)\k<SndTag>.*?(?<TrdTag>\(B-VP\))(?<TrdWrd>\w+)\k<TrdTag>.*?(?<FthTag>\(B-NP\))(?<FthWrd>\w+)\k<FthTag>.*?(?<FfhTag>\(I-NP\))(?<FfhWrd>\w+)\k<FfhTag>");
var ms = rx.Matches(str).Cast<Match>().Select(p => p.Groups["FstWrd"].Value + " " + p.Groups["SndWrd"].Value + " " + p.Groups["TrdWrd"].Value + " " + p.Groups["FthWrd"].Value + " " + p.Groups["FfhWrd"].Value).ToList();

Screenshot