对于我想用RegEx解析的文件,我有以下架构
[Custom/Random Name]_[MainVersion]_[MinorVersion].xls
目前我有以下RegEx(失败)
(?<firstPart>.+)_(?<mainVersion>\d+)(|_(?<minorVersion>\d+))\.xls
当样本字符串为
时使用此选项Hello World_22_1.xls
导致:
match.Groups["firstPart"].Value == "Hello World_22"
match.Groups["mainVersion"].Value == "1"
match.Groups["minorVersion"].Value == ""
但它应该是
match.Groups["firstPart"].Value == "Hello World"
match.Groups["mainVersion"].Value == "22"
match.Groups["minorVersion"].Value == "1"
问题在于我的RegEx用于&#34; firstPart&#34;允许任何字符与&#34;。+&#34; (包括&#34; _
&#34;)所以它一直持续到#34; _
&#34;的最后一次出现,因为我可以像这样重写我的RegEx
(?<firstPart>[^_]+)_(?<mainVersion>\d+)(|_(?<minorVersion>\d+))\.xls
但是如果fileName是这样的话,这个RegEx将会失败:
Hello_World_22_1.xls
导致:
match.Groups["firstPart"].Value == "World"
match.Groups["mainVersion"].Value == "22"
match.Groups["minorVersion"].Value == "1"
有没有办法向后验证字符串,因为我要找的东西总是在fileName的末尾?
RegEx应该为这些字符串返回正确的值 (为简单起见,我已用[firstPart] / [mainVersion] / [minorVersion])将所需结果写入大括号中
Hello World_22_1.xls (Hello World/22/1)
Hello_World_22_1.xls (Hello_World/22/1)
Hello_World_22.xls (Hello_World/22/)
Hello_1_World_22_1.xls (Hello_1_World/22/1)
Hello_1_World_22.xls (Hello_1_World/22/)
Hello_33_2_World_22_1.xls (Hello_33_2_World/22/1)
Hello_22_1_World.xls (//) --> (Wouldnt mind if the your solutions would return Hello_22_1_World as firstPart)
33_22.xls (33/22/)
33_22_1.xls (33/22/1)
扭转了输入的字符串,但这个&#34;解决方案&#34;非常值得怀疑
static void Main(string[] args)
{
Console.WriteLine(TestRegEx("Hello World_22_1.xls", "Hello World", "22", "1"));
Console.WriteLine(TestRegEx("Hello_World_22_1.xls", "Hello_World", "22", "1"));
Console.WriteLine(TestRegEx("Hello_World_22.xls", "Hello_World", "22", ""));
Console.WriteLine(TestRegEx("Hello_1_World_22_1.xls", "Hello_1_World", "22", "1"));
Console.WriteLine(TestRegEx("Hello_1_World_22.xls", "Hello_1_World", "22", ""));
Console.WriteLine(TestRegEx("Hello_33_2_World_22_1.xls", "Hello_33_2_World", "22", "1"));
Console.WriteLine(TestRegEx("Hello_22_1_World.xls", "", "", ""));
Console.WriteLine(TestRegEx("33_22.xls", "33", "22", ""));
Console.WriteLine(TestRegEx("33_22_1.xls", "33", "22", "1"));
Console.ReadLine();
}
private static bool TestRegEx(string str, string firstPart, string mainVersion, string minorVersion)
{
var regEx = new Regex("slx\\.((?<minorVersion>\\d+)_|)(?<mainVersion>\\d+)_(?<firstPart>.+)");
var reverseStr = new string(str.Reverse().ToArray());
var match = regEx.Match(reverseStr);
var x1 = new string(match.Groups["firstPart"].Value.Reverse().ToArray());
var x2 = new string(match.Groups["mainVersion"].Value.Reverse().ToArray());
var x3 = new string(match.Groups["minorVersion"].Value.Reverse().ToArray());
return x1 == firstPart && x2 == mainVersion && x3 == minorVersion;
}
答案 0 :(得分:2)
主要的麻烦当然是开头的贪婪点模式首先抓住整个输入,然后回溯只产生最后的数字。为了能够使用可选组并获取其内容(如果有的话),您需要使用带有点匹配模式的 lazy 量词。
我建议使用
(?<firstPart>.+?)(?:_(?<mainVersion>\d+)(?:_(?<minorVersion>\d+))?)?\.xls
请参阅regex demo
<强>详情:
(?<firstPart>.+?)
- 由于懒惰的+?
量词(?:_(?<mainVersion>\d+)(?:_(?<minorVersion>\d+))?)?
- 1次或0次出现:
_(?<mainVersion>\d+)
- _
和“mainVersion”组捕获一个或多个数字(?:_(?<minorVersion>\d+))?
- 可选序列
_
- 下划线(?<minorVersion>\d+)
- 一个捕捉1+位数的“minorVersion”组\.xls
- .xls
子字符串。我更喜欢(?<firstPart>.+?)_(?<mainVersion>\d+)(?:_(?<minorVersion>\d+))?\.xls
正则表达式,因为后者根本不匹配Hello_22_1_World.xls
。如果您不需要匹配它,那么最后一个表达式可能更合适。
答案 1 :(得分:0)