我希望你们能帮助我。 我正在使用C#.Net 4.0
我想验证文件结构,如
const string dataFileScr = @"
Start 0
{
Next = 1
Author = rk
Date = 2011-03-10
/* Description = simple */
}
PZ 11
{
IA_return()
}
GDC 7
{
Message = 6
Message = 7
Message = 8
Message = 8
RepeatCount = 2
ErrorMessage = 10
ErrorMessage = 11
onKey[5] = 6
onKey[6] = 4
onKey[9] = 11
}
";
到目前为止,我设法构建了这个正则表达式模式
const string patternFileScr = @"
^
((?:\[|\s)*
(?<Section>[^\]\r\n]*)
(?:\])*
(?:[\r\n]{0,}|\Z))
(
(?:\{) ### !! improve for .ini file, dont take {
(?:[\r\n]{0,}|\Z)
( # Begin capture groups (Key Value Pairs)
(?!\}|\[) # Stop capture groups if a } is found; new section
(?:\s)* # Line with space
(?<Key>[^=]*?) # Any text before the =, matched few as possible
(?:[\s]*=[\s]*) # Get the = now
(?<Value>[^\r\n]*) # Get everything that is not an Line Changes
(?:[\r\n]{0,})
)* # End Capture groups
(?:[\r\n]{0,})
(?:\})?
(?:[\r\n\s]{0,}|\Z)
)*
";
和c#
Dictionary <string, Dictionary<string, string>> DictDataFileScr
= (from Match m in Regex.Matches(dataFileScr,
patternFileScr,
RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline)
select new
{
Section = m.Groups["Section"].Value,
kvps = (from cpKey in m.Groups["Key"].Captures.Cast().Select((a, i) => new { a.Value, i })
join cpValue in m.Groups["Value"].Captures.Cast().Select((b, i) => new { b.Value, i }) on cpKey.i equals cpValue.i
select new KeyValuePair(cpKey.Value, cpValue.Value)).OrderBy(_ => _.Key)
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value)
}).ToDictionary(itm => itm.Section, itm => itm.kvps);
适用于
const string dataFileScr = @"
Start 0
{
Next = 1
Author = rk
Date = 2011-03-10
/* Description = simple */
}
GDC 7
{
Message = 6
RepeatCount = 2
ErrorMessage = 10
onKey[5] = 6
onKey[6] = 4
onKey[9] = 11
}
";
换句话说
Section1
{
key1=value1
key2=value2
}
Section2
{
key1=value1
key2=value2
}
,但是
DictDataFileScr["GDC 7"]["Message"] = "6|7|8|8"
DictDataFileScr["GDC 7"]["ErrorMessage"] = "10|11"
....
[Section1]
key1 = value1
key2 = value2
[Section2]
key1 = value1
key2 = value2
...
之后看不到下一节
....
PZ 11
{
IA_return()
}
.....
答案 0 :(得分:2)
帮助自己和你的理智,并学习如何使用GPLex和GPPG。它们是C#对Lex和Yacc(或Flex和Bison,如果你愿意)最接近的东西,它们是这项工作的合适工具。
正则表达式是执行强健字符串匹配的绝佳工具,但是当您需要匹配字符串结构时,需要“语法”。这就是解析器的用途。 GPLex采用一堆正则表达式并生成一个超快的词法分析器。 GPPG采用您编写的语法并生成超快速解析器。
相信我,学习如何使用这些工具......或者像他们这样的任何其他工具。你会很高兴的。
答案 1 :(得分:2)
这是C#中正则表达式的完整返工。
假设:(告诉我其中一个是假的还是全部都是假的)
正则表达式标志:
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled | RegexOptions.Singleline
输入测试:
const string dataFileScr = @"
Start 0
{
Next = 1
Author = rk
Date = 2011-03-10
/* Description = simple */
}
PZ 11
{
IA_return()
}
GDC 7
{
Message = 6
Message = 7
Message = 8
Message = 8
RepeatCount = 2
ErrorMessage = 10
ErrorMessage = 11
onKey[5] = 6
onKey[6] = 4
onKey[9] = 11
}
[Section1]
key1 = value1
key2 = value2
[Section2]
key1 = value1
key2 = value2
";
重写正则表达式:
const string patternFileScr = @"
(?<Section> (?# Start of a non ini file section)
(?<SectionName>[\w ]+)\s* (?# Capture section name)
{ (?# Match but don't capture beginning of section)
(?<SectionBody> (?# Capture section body. Section body can be empty)
(?<SectionLine>\s* (?# Capture zero or more line(s) in the section body)
(?: (?# A line can be either a key/value pair, a comment or a function call)
(?<KeyValuePair>(?<Key>[\w\[\]]+)\s*=\s*(?<Value>[\w-]*)) (?# Capture key/value pair. Key and value are sub-captured separately)
|
(?<Comment>/\*.+?\*/) (?# Capture comment)
|
(?<FunctionCall>[\w]+\(\)) (?# Capture function call. A function can't have parameters though)
)\s* (?# Match but don't capture white characters)
)* (?# Zero or more line(s), previously mentionned in comments)
)
} (?# Match but don't capture beginning of section)
)
|
(?<Section> (?# Start of an ini file section)
\[(?<SectionName>[\w ]+)\] (?# Capture section name)
(?<SectionBody> (?# Capture section body. Section body can be empty)
(?<SectionLine> (?# Capture zero or more line(s) in the section body. Only key/value pair allowed.)
\s*(?<KeyValuePair>(?<Key>[\w\[\]]+)\s*=\s*(?<Value>[\w-]+))\s* (?# Capture key/value pair. Key and value are sub-captured separately)
)* (?# Zero or more line(s), previously mentionned in comments)
)
)
";
<强>讨论强> 构建正则表达式以匹配非INI文件部分 (1) 或INI文件部分 (2) 。
(1)非INI文件部分 这些部分由部分名称后跟由{和}括起的正文组成。 节名称con包含字母,数字或空格。 截面体由零个或多个线组成。一行可以是键/值对(键=值),注释(/ *这是注释* /)或没有参数的函数调用(my_function())。
(2)INI文件部分 这些部分由[和]括起来的部分名称组成,后跟零个或多个键/值对。每一对都在一条线上。
答案 2 :(得分:0)
#2。不适用于.ini文件
无法正常工作,因为正则表达式中指出{在[部分]之后需要{。 如果你有这样的东西,你的正则表达式将匹配:
[Section] { key = value }
答案 3 :(得分:0)
以下是Perl中的示例。 Perl没有命名捕获数组。可能是因为回溯 也许你可以从正则表达式中选择一些东西。这假设没有{}括号的嵌套。
修改永远不要单独留下足够的内容,修改后的版本如下。
use strict;
use warnings;
my $str = '
Start 0
{
Next = 1
Author = rk
Date = 2011-03-10
/* Description = simple
*/
}
asdfasdf
PZ 11
{
IA_return()
}
[ section 5 ]
this = that
[ section 6 ]
this_ = _that{hello() hhh = bbb}
TOC{}
GDC 7
{
Message = 6
Message = 7
Message = 8
Message = 8
RepeatCount = 2
ErrorMessage = 10
ErrorMessage = 11
onKey[5] = 6
onKey[6] = 4
onKey[9] = 11
}
';
use re 'eval';
my $rx = qr/
\s*
( \[ [^\S\n]* )? # Grp 1 optional ini section delimeter '['
(?<Section> \w+ (?:[^\S\n]+ \w+)* ) # Grp 2 'Section'
(?(1) [^\S\n]* \] |) # Condition, if we matched '[' then look for ']'
\s*
(?<Body> # Grp 3 'Body' (for display only)
(?(1)| \{ ) # Condition, if we're not a ini section then look for '{'
(?{ print "Section: '$+{Section}'\n" }) # SECTION debug print, remove in production
(?: # _grp_
\s* # whitespace
(?: # _grp_
\/\* .*? \*\/ # some comments
| # OR ..
# Grp 4 'Key' (tested with print, Perl doesen't have named capture arrays)
(?<Key> \w[\w\[\]]* (?:[^\S\n]+ [\w\[\]]+)* )
[^\S\n]* = [^\S\n]* # =
(?<Value> [^\n]* ) # Grp 5 'Value' (tested with print)
(?{ print " k\/v: '$+{Key}' = '$+{Value}'\n" }) # KEY,VALUE debug print, remove in production
| # OR ..
(?(1)| [^{}\n]* ) # any chars except newline and [{}] on the condition we're not a ini section
) # _grpend_
\s* # whitespace
)* # _grpend_ do 0 or more times
(?(1)| \} ) # Condition, if we're not a ini section then look for '}'
)
/x;
while ($str =~ /$rx/xsg)
{
print "\n";
print "Body:\n'$+{Body}'\n";
print "=========================================\n";
}
__END__
输出
Section: 'Start 0'
k/v: 'Next' = '1'
k/v: 'Author' = 'rk'
k/v: 'Date' = '2011-03-10'
Body:
'{
Next = 1
Author = rk
Date = 2011-03-10
/* Description = simple
*/
}'
=========================================
Section: 'PZ 11'
Body:
'{
IA_return()
}'
=========================================
Section: 'section 5'
k/v: 'this' = 'that'
Body:
'this = that
'
=========================================
Section: 'section 6'
k/v: 'this_' = '_that{hello() hhh = bbb}'
Body:
'this_ = _that{hello() hhh = bbb}
'
=========================================
Section: 'TOC'
Body:
'{}'
=========================================
Section: 'GDC 7'
k/v: 'Message' = '6'
k/v: 'Message' = '7'
k/v: 'Message' = '8'
k/v: 'Message' = '8'
k/v: 'RepeatCount' = '2'
k/v: 'ErrorMessage' = '10'
k/v: 'ErrorMessage' = '11'
k/v: 'onKey[5]' = '6'
k/v: 'onKey[6]' = '4'
k/v: 'onKey[9]' = '11'
Body:
'{
Message = 6
Message = 7
Message = 8
Message = 8
RepeatCount = 2
ErrorMessage = 10
ErrorMessage = 11
onKey[5] = 6
onKey[6] = 4
onKey[9] = 11
}'
=========================================