正则表达式匹配多行文本的块?

时间:2014-09-10 15:08:33

标签: c# regex

我有一个包含200多种以下格式记录的文本文件:

 @INPROCEEDINGS{Rajan-Sullivan03,
  author = {Hridesh Rajan and Kevin J. Sullivan},
  title = {{{Eos}: Instance-Level Aspects for Integrated System Design}},
  booktitle = {ESEC/FSE 2003},
  year = {2003},
  pages = {297--306},
  month = sep,
  isbn = {1-58113-743-5},
  location = {Helsinki, FN},
  owner = {Administrator},
  timestamp = {2009.03.08}
}

@INPROCEEDINGS{ras-mor-models-06,
  author = {Awais Rashid and Ana Moreira},
  title = {Domain Models Are {NOT} Aspect Free},
  booktitle = {MoDELS},
  year = {2006},
  editor = {Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio},
  volume = {4199},
  series = {Lecture Notes in Computer Science},
  pages = {155--169},
  publisher = {Springer},
  bibdate = {2006-12-07},
  bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/models/models2006.html#RashidM06},
  isbn = {3-540-45772-0},
  owner = {aljasser},
  timestamp = {2008.09.16},
  url = {http://dx.doi.org/10.1007/11880240_12}
}

基本上记录以@开头并以}结尾,所以我尝试做的是以@开头并以} \ n结束但是没有用,它只会匹配第一条记录而另一条记录是因为它之后没有新的界限。

            string pattern = @"(^@)([\s\S]*)(}$\n}(\n))";

当我尝试通过制作它来修复它时,它将所有内容匹配为一个匹配

 string pattern = @"(^@)([\s\S]*)(}$\n}(\n*))";

我已经尝试过,直到我达到以下模式但它不起作用,如果你可以解决它,或者可能提供一个更有效的一个加上一些解释就完成了。

这是我的代码:

            string pattern = @"(^@)([\s\S]*)(}$\n}(\n))";
        Regex regex = new Regex(pattern,RegexOptions.Multiline);
        var matches = regex.Matches(bibFileContent).Cast<Match>().Select(m => m.Value).ToList();

4 个答案:

答案 0 :(得分:2)

如果使用Matches方法,则需要这种处理平衡大括号的模式:

string pattern = @"@[A-Z]+{(?>[^{}]+|(?<open>{)|(?<-open>}))*(?(open)(?!))}";
Regex regex = new Regex(pattern);

或确保所有结果都是格式良好的(从括号的角度来看)

string pattern = @"\G[^{}]*(@[A-Z]+{(?>[^{}]+|(?<open>{)|(?<-open>}))*(?(open)(?!))})";

这两种模式使用命名捕获作为计数器。当满足开括号时,计数器递增,当满足结束括号时,计数器递减。 (?(open)(?!))是一个条件测试,如果计数器不为空,则使模式失败。

online demo

如果chuncks不包含@字符,则使用Regex.Split(input, pattern)方法会更方便:

string[] result = Regex.Split(input, @"[^}]*(?=@)");

如果chuncks可以包含@字符,那么您可以通过更具描述性的前瞻性使其更加健壮:

string[] result = Regex.Split(input, @"[^}]*(?=@[A-Z]+{)");

string[] result = Regex.Split(input, @"\s*(?=@[A-Z]+{)");

答案 1 :(得分:1)

我认为问题是你的输入没有完成\ n所以你的第二条记录不匹配。您应该使用$

进行更改

这将在第1组中获得记录:

@(.*?)^}(?:[\r\n]+|$)

DEMO

请注意,您必须使用ms修饰符

使用此代码:

Regex regex = new Regex(pattern, RegexOptions.Multiline | RegexOptions.Singleline);
MatchCollection mc = regex.Matches(bibFileContent);
List<String> results = new List<String>();
foreach (Group m in mc[0].Groups)
{
results.Add(m.Value);
}

答案 2 :(得分:1)

您可以使用这样的简单正则表达式:

(@[^@]+)

<强> Working demo

enter image description here

这个想法是匹配以@开头但没有另一个@的内容。顺便说一句,如果你只想匹配模式而不是捕获模式,只需删除capturin组:

@[^@]+

答案 3 :(得分:1)

这看起来像是平衡群体的候选人。

 # @"(?m)^[^\S\r\n]*@[^{}]+(?:\{(?>[^{}]+|\{(?<Depth>)|\}(?<-Depth>))*(?(Depth)(?!))\})"

 (?m)
 ^ [^\S\r\n]* 
 @ [^{}]+ 
 (?:
      \{                            # Match opening {
      (?>                           # Then either match (possessively):
           [^{}]+                        #   Anything (but only if we're not at the start of { or } )
        |                              # or
           \{                            #  { (and increase the braces counter)
           (?<Depth> )
        |                              # or
           \}                            #  } (and decrease the braces counter).
           (?<-Depth> )
      )*                            # Repeat as needed.
      (?(Depth)                     # Assert that the braces counter is at zero.
           (?!)                          # Fail if it isn't
      )
      \}                            # Then match a closing }. 
 )

代码示例

Regex FghRx = new Regex( @"(?m)^[^\S\r\n]*@[^{}]+(?:\{(?>[^{}]+|\{(?<Depth>)|\}(?<-Depth>))*(?(Depth)(?!))\})" );
string FghData =
@"
@INPROCEEDINGS{Rajan-Sullivan03,
author = {Hridesh Rajan and Kevin J. Sullivan},
  title = {{{Eos}: Instance-Level Aspects for Integrated System Design}},
  booktitle = {ESEC/FSE 2003},
  year = {2003},
  pages = {297--306},
  month = sep,
  isbn = {1-58113-743-5},
  location = {Helsinki, FN},
  owner = {Administrator},
  timestamp = {2009.03.08}
}

@INPROCEEDINGS{ras-mor-models-06,
  author = {Awais Rashid and Ana Moreira},
  title = {Domain Models Are {NOT} Aspect Free},
  booktitle = {MoDELS},
  year = {2006},
  editor = {Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio},
  volume = {4199},
  series = {Lecture Notes in Computer Science},
  pages = {155--169},
  publisher = {Springer},
  bibdate = {2006-12-07},
  bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/models/models2006.html#RashidM06},
  isbn = {3-540-45772-0},
  owner = {aljasser},
  timestamp = {2008.09.16},
  url = {http://dx.doi.org/10.1007/11880240_12}
}
";

Match FghMatch = FghRx.Match(FghData);
while (FghMatch.Success)
{
    Console.WriteLine("New Record\n------------------------");
    Console.WriteLine("{0}", FghMatch.Groups[0].Value);
    FghMatch = FghMatch.NextMatch();
    Console.WriteLine("");
}

输出

New Record
------------------------
@INPROCEEDINGS{Rajan-Sullivan03,
author = {Hridesh Rajan and Kevin J. Sullivan},
  title = {{{Eos}: Instance-Level Aspects for Integrated System Design}},
  booktitle = {ESEC/FSE 2003},
  year = {2003},
  pages = {297--306},
  month = sep,
  isbn = {1-58113-743-5},
  location = {Helsinki, FN},
  owner = {Administrator},
  timestamp = {2009.03.08}
}

New Record
------------------------
@INPROCEEDINGS{ras-mor-models-06,
  author = {Awais Rashid and Ana Moreira},
  title = {Domain Models Are {NOT} Aspect Free},
  booktitle = {MoDELS},
  year = {2006},
  editor = {Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio},
  volume = {4199},
  series = {Lecture Notes in Computer Science},
  pages = {155--169},
  publisher = {Springer},
  bibdate = {2006-12-07},
  bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/models/models2006.html#RashidM06},
  isbn = {3-540-45772-0},
  owner = {aljasser},
  timestamp = {2008.09.16},
  url = {http://dx.doi.org/10.1007/11880240_12}
}