正则表达式问题:直到下一场比赛或文件结束

时间:2011-01-25 16:26:14

标签: c# regex

我正在研究一个文档解析器来从我给出的一些文档中提取数据,并且我用C#编码。文件格式如下:


(Type 1): (potentially multi-lined string)
(Type 2): (potentially multi-lined string)
(Type 3): (potentially multi-lined string)
...
(Type N): (potentially multi-lined string)
(Type 1): (potentially multi-lined string)
...
End Of Document.

文件以相同的格式重复(类型1) - (类型N)M次(

我遇到多线字符串问题并找到(类型1)的最后一次迭代 - (类型N)

我需要做的是捕获由其前面命名的组中的(可能是多线的字符串)(类型#)

以下是我想要匹配的文档片段:

Name: John Dow
Position: VP. over Development
Bio: Here is a really long string of un important stuff
that could include words like "Bio" or "Name".  Some times I have problems
here, but for the most part it should be normal Bio information
Position History: Vp. over Development
Sr. Project Manager
Jr. Project Manager
Developer
Peon
Notes: Here are some notes that may or may not be multilined
and if it is, all the lines need to be captured for this person.
Name: Joe Noob
Position: Peon
Bio: I'm a peon, so I have little bio
Position History: Peon
Notes: few notes
Name: Jane Smith
Position: VP. over Sales
Bio: Here is a really long string of more un important stuff
that could include words like "Bio" or "Name".  Some times I have problems
here, but for the most part it should be normal Bio information
Position History: Vp. over Sales
Sales Manager
Secretary
Notes: Here are some notes that may or may not be multilined
and if it is, all the lines need to be captured for this person.



(#类型)的顺序始终相同,并且它们始终以换行符开头。

我有什么:

Name:\s(?:(?.*?)\r\n)+?Position:\s(?:(?.*?)\r\n)+?Bio:\s(?:(?.*?)\r\n)+?Position History:\s(?:(?.*?)\r\n)+?Notes:\s(?:(?.*?)\r\n)+?



任何帮助都会很棒!

3 个答案:

答案 0 :(得分:3)

因为你正在使用延迟匹配,所以最后一个令牌只需要它。你可以通过在模式的末尾添加一个lookahed来解决这个问题,以匹配到下一个标记:

(?=^Name:|$)

这是完整的正则表达式:

Name:\s(?:(.*?)\s+)Position:\s(?:(.*?)\s+)Bio:\s(?:(.*?)\s+)Position History:\s(?:(.*?)\s+)Notes:\s(?:(.*?)\s+)(?=^Name:|$)

示例:http://regexhero.net/tester/?id=92982feb-806f-4d0a-96a3-5ef6689a0e01

答案 1 :(得分:2)

试试这个:

(?'tag'[\w\s]+):\s*(?'val'.*([\r\n][^:]*)*)

我只是把':'前面的标签作为命名组'tag',并将(潜在的)多行文本作为值。

答案 2 :(得分:2)

最简单的解决方法是从右到左模式进行匹配:

Regex r = new Regex(@"Name:\s(?:(.*?)\r\n)+?" +
                    @"Position:\s(?:(.*?)\r\n)+?" +
                    @"Bio:\s(?:(.*?)\r\n)+?" +
                    @"Position History:\s(?:(.*?)\r\n)+?" +
                    @"Notes:\s(?:(.*?)\r\n)+?",
                    RegexOptions.Singleline | RegexOptions.RightToLeft);

顺便说一句,我不得不删除一堆不合适的问号以使其完全正常工作。你确实希望这些群体捕获,不是吗?