我想尝试制作更多类型的正则表达式,所以我一直在努力做以下工作。
这是我的表达:https://regex101.com/r/VzspFy/4/
在测试字符串上,前3个是好的,所以这样的模式必须匹配,问题是最后一个,我不想包含它,所以我试着这样做:< / p>
https://regex101.com/r/9HVKTK/2
和此:
https://regex101.com/r/9HVKTK/1
但没有运气!
主要思想是:
`aaa ... bbb ccc` -> must match
`ccc ... (aaa|ddd|eee) ... bbb ccc` -> should not match
我如何才能使其发挥作用或者更好的实施?
答案 0 :(得分:1)
您可以使用
var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
请参阅regex demo。您需要获得第1组值。
模式详情
(?:^|])
- 字符串的开头或]
(如果您有多行字符串作为输入,请添加| RegexOptions.Multiline
,但我认为这些都是独立的字符串)(?:(?!\b(?:eng|ita)\b)[^]])*
- 尽可能多的]
字符,但不会开始整个单词eng
或ita
(请参阅tempered greedy token了解这构造得更好)\b
- 字边界(eng(?:\W+\w+)?\W+sub\W+ita)
- 第1组:
eng
- 文字子字符串(?:\W+\w+)?
- 任意1个非字字符的可选序列,后跟1个字字符(实际上是一个可选字)\W+
- 1 +非单词字符sub
- 文字子字符串\W+
- 1 +非单词字符ita
- 文字子字符串\b
- 字边界请参阅C# demo:
var strs = new List<string> {
"Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE",
"Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
"Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
"Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus",
"Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS",
"Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni",
"Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL ",
"Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]"
};
var rx = new Regex(@"(?:^|])(?:(?!\b(?:eng|ita)\b)[^]])*\b(eng(?:\W+\w+)?\W+sub\W+ita)\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
foreach (var s in strs)
{
Console.WriteLine(s);
var result = rx.Match(s);
if (result.Success)
Console.WriteLine("Matched: {0}", result.Groups[1].Value);
else
Console.WriteLine("No match!");
Console.WriteLine("==========================================");
}
输出:
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e01-08 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni SEASON PREMIERE
Matched: Eng Mp3 - Sub Ita
==========================================
Young Sheldon S01e13 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e08 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] WEBMux Morpheus
No match!
==========================================
Young Sheldon S01e14 [SATRip 720p - H264 - Eng Ac3 - Sub Ita] HDTV by AVS
Matched: Eng Ac3 - Sub Ita
==========================================
Lucifer S03e15 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S03e16 [XviD - Eng Mp3 - Sub Ita Eng] DLRip By Pir8 [CURA] Fede e Religioni
Matched: Eng Mp3 - Sub Ita
==========================================
Lucifer S02e01-13 [XviD - Eng Mp3 - Sub Ita] DLRip by Pir8 [CURA] Fede e Religioni FULL
Matched: Eng Mp3 - Sub Ita
==========================================
Absentia S01e01-10 [Mux 1080p - H264 - Ita Eng Ac3 - Sub Ita Eng] By Morpheus The.Breadwinner.2017.ENG.Sub.ITA.HDRip.XviD-[WEB]
Matched: ENG.Sub.ITA
==========================================
答案 1 :(得分:0)
这是一个相对简单的问题正则表达式:
(?:(?<=[-]\s)(?:ITA\s)?\w{3}\s\w{3}\s[-]\s\w{3}\s\w{3}\s\w{3}\b)|(?:Eng\.sub\.ita)
你可以test out here。
(?<=[-]\s)
是一个积极的后瞻,确保匹配前面有短划线和空格(但不匹配)
(?:ITA\s)?
是一个非捕获组,它告诉正则表达式,如果匹配前面有“ITA”和空格,那么也匹配它们。
\w{3}
匹配三个单词字符的字符串(字母/数字/下划线或它们的组合)
\s
表示单个空格,
[-]
只是匹配单个-
的一种奇特方式。
|(?:Eng\.sub\.ita)
告诉正则表达式匹配eng.sub.ita
(不区分大小写)以及原始匹配(如果一起出现在句子中)。
如果节目的名称包含- red SEO - two one
或'dash-space-three_letters-space-three_letters-space-dash-space-three_letters-space-three_letters'的内容,那么甚至名称也是如此节目将匹配。
但是,包含此类格式的节目的可能性可以忽略不计,因此您无需担心。