Question

对于我想用RegEx解析的文件，我有以下架构

[Custom/Random Name]_[MainVersion]_[MinorVersion].xls

目前我有以下RegEx（失败）

(?<firstPart>.+)_(?<mainVersion>\d+)(|_(?<minorVersion>\d+))\.xls

当样本字符串为

时使用此选项

Hello World_22_1.xls

导致：

match.Groups["firstPart"].Value == "Hello World_22"
match.Groups["mainVersion"].Value == "1"
match.Groups["minorVersion"].Value == ""

但它应该是

match.Groups["firstPart"].Value == "Hello World"
match.Groups["mainVersion"].Value == "22"
match.Groups["minorVersion"].Value == "1"

问题在于我的RegEx用于＆＃34; firstPart＆＃34;允许任何字符与＆＃34;。+＆＃34; （包括＆＃34; _＆＃34;）所以它一直持续到＃34; _＆＃34;的最后一次出现，因为我可以像这样重写我的RegEx

(?<firstPart>[^_]+)_(?<mainVersion>\d+)(|_(?<minorVersion>\d+))\.xls

但是如果fileName是这样的话，这个RegEx将会失败：

Hello_World_22_1.xls

导致：

match.Groups["firstPart"].Value == "World"
match.Groups["mainVersion"].Value == "22"
match.Groups["minorVersion"].Value == "1"

有没有办法向后验证字符串，因为我要找的东西总是在fileName的末尾？

RegEx应该为这些字符串返回正确的值（为简单起见，我已用[firstPart] / [mainVersion] / [minorVersion]）将所需结果写入大括号中

Hello World_22_1.xls (Hello World/22/1)
Hello_World_22_1.xls (Hello_World/22/1)
Hello_World_22.xls (Hello_World/22/)
Hello_1_World_22_1.xls (Hello_1_World/22/1)
Hello_1_World_22.xls (Hello_1_World/22/)
Hello_33_2_World_22_1.xls (Hello_33_2_World/22/1)
Hello_22_1_World.xls (//) --> (Wouldnt mind if the your solutions would return Hello_22_1_World as firstPart)
33_22.xls (33/22/)
33_22_1.xls (33/22/1)

扭转了输入的字符串，但这个＆＃34;解决方案＆＃34;非常值得怀疑

static void Main(string[] args)
{
    Console.WriteLine(TestRegEx("Hello World_22_1.xls", "Hello World", "22", "1"));
    Console.WriteLine(TestRegEx("Hello_World_22_1.xls", "Hello_World", "22", "1"));
    Console.WriteLine(TestRegEx("Hello_World_22.xls", "Hello_World", "22", ""));
    Console.WriteLine(TestRegEx("Hello_1_World_22_1.xls", "Hello_1_World", "22", "1"));
    Console.WriteLine(TestRegEx("Hello_1_World_22.xls", "Hello_1_World", "22", ""));
    Console.WriteLine(TestRegEx("Hello_33_2_World_22_1.xls", "Hello_33_2_World", "22", "1"));
    Console.WriteLine(TestRegEx("Hello_22_1_World.xls", "", "", ""));
    Console.WriteLine(TestRegEx("33_22.xls", "33", "22", ""));
    Console.WriteLine(TestRegEx("33_22_1.xls", "33", "22", "1"));

    Console.ReadLine();
}

private static bool TestRegEx(string str, string firstPart, string mainVersion, string minorVersion)
{
    var regEx = new Regex("slx\\.((?<minorVersion>\\d+)_|)(?<mainVersion>\\d+)_(?<firstPart>.+)");
    var reverseStr = new string(str.Reverse().ToArray());

    var match = regEx.Match(reverseStr);
    var x1 = new string(match.Groups["firstPart"].Value.Reverse().ToArray());
    var x2 = new string(match.Groups["mainVersion"].Value.Reverse().ToArray());
    var x3 = new string(match.Groups["minorVersion"].Value.Reverse().ToArray());

    return x1 == firstPart && x2 == mainVersion && x3 == minorVersion;
}

Answer 1

主要的麻烦当然是开头的贪婪点模式首先抓住整个输入，然后回溯只产生最后的数字。为了能够使用可选组并获取其内容（如果有的话），您需要使用带有点匹配模式的 lazy 量词。

我建议使用

(?<firstPart>.+?)(?:_(?<mainVersion>\d+)(?:_(?<minorVersion>\d+))?)?\.xls

请参阅regex demo

<强>详情：

(?<firstPart>.+?) - 由于懒惰的+?量词
(?:_(?<mainVersion>\d+)(?:_(?<minorVersion>‌\d+))?)? - 1次或0次出现：
- _(?<mainVersion>\d+) - _和“mainVersion”组捕获一个或多个数字
- (?:_(?<minorVersion>‌\d+))? - 可选序列
  - _ - 下划线
  - (?<minorVersion>‌\d+) - 一个捕捉1+位数的“minorVersion”组
\.xls - .xls子字符串。

我更喜欢(?<firstPart>.+?)_(?<mainVersion>\d+)(?:_(?<minorVersion>\d+‌))?\.xls正则表达式，因为后者根本不匹配Hello_22_1_World.xls。如果您不需要匹配它，那么最后一个表达式可能更合适。

Answer 2

使用此：

^(?<firstPart>.+?)_(?<mainVersion>\d+)_(?<minorVersion>\d+)\.xls$

以下是DEMO

RegEx - 文件名中的版本

2 个答案: