我想知道是否可以修改使用*
和?
的通配符表达式将其转换为正则表达式,以验证它是否与某些字符串匹配。
换句话说,如果我对这些字符串使用过滤器(不区分大小写):*bl?e*
:
["Blue", "Black", "Red", "Light blue", "Light black"]
我想得到:
["Blue, "Light blue"].
有人知道如何用正则表达式做到这一点吗? 除了使用正则表达式之外,还有更好的方法吗?
添加以更好地澄清我的想法...
确定! ......一如既往,我想我会问一个非常明确的问题,并通过答案意识到我完全搞砸了我的问题。我想做一个函数,根据表达式(作为我的函数的参数)过滤集合,与dos('*''?')相同的规则。我认为使用正则表达式会是一个好主意。我是对的,什么是正则表达式?另外......我正在使用C#而且我想知道我是否无法访问任何可以直接执行此操作的内容?
我也看(非常好的答案)How do I specify a wildcard (for ANY character) in a c# regex statement?
我终于在.net Patterns and Practices库中使用了Glob类。
但作为参考,这是我将Glob exp转换为RegEx的代码:
using System.Text;
using System.Text.RegularExpressions;
namespace HQ.Util.General
{
public class RegexUtil
{
public const string RegExMetaChars = @"*?(){}[]+-^$.|\"; // Do not change the order. Algo depends on it (2 first chars should be dos like wildcard char)
// ******************************************************************
/// <summary>
/// Convert an filter expression with '*' (wildcard any char) and '?' (wildcard on char) into a valid regex and
/// strip any special regex character
/// </summary>
/// <param name="dosLikeExpressionFilter"></param>
/// <returns></returns>
public static string DosLikeExpressionFilterToRegExFilterExpression(string dosLikeExpressionFilter)
{
StringBuilder regex = new StringBuilder();
regex.Append("(?i)"); // Case insensitive
int startIndex = 0;
int count = dosLikeExpressionFilter.Length;
while (startIndex < count)
{
int metaIndex = RegExMetaChars.IndexOf(dosLikeExpressionFilter[startIndex]);
if (metaIndex >= 0)
{
if (metaIndex == 0)
{
regex.Append(".*");
}
else if (metaIndex == 1)
{
regex.Append(".");
}
else
{
regex.Append("\\");
regex.Append(dosLikeExpressionFilter[startIndex]);
}
}
else
{
regex.Append(dosLikeExpressionFilter[startIndex]);
}
startIndex++;
}
return regex.ToString();
}
// ******************************************************************
/// <summary>
/// See 'DosLikeExpressionFilterToRegExFilterExpression' description to see what type of Regex is returned
/// </summary>
/// <param name="dosLikeExpressionFilter"></param>
/// <returns></returns>
public static Regex DosLikeExpressionFilterToRegEx(string dosLikeExpressionFilter)
{
return new Regex(DosLikeExpressionFilterToRegExFilterExpression(dosLikeExpressionFilter));
}
// ******************************************************************
}
}
答案 0 :(得分:2)
Any single character Any number of characters Character range
Glob syntax ? * [0-9]
Regex syntax . .* [0-9]
因此Bl?e
(glob)变为Bl.e
(正则表达式),*Bl?e*
变为.*Bl.e.*
。
正如Joey正确指出的那样,你可以(通常,取决于正则表达式引擎)将(?i)
添加到正则表达式中以使其不区分大小写。
请注意,许多在globbing模式中没有特殊含义的字符在正则表达式中具有特殊含义,因此您不能只从glob到regex进行简单的搜索和替换。
答案 1 :(得分:1)
需要解决相同的问题(使用用户输入中的 * 和 ? 通配符模式来过滤任意字符串列表),但扩展名可能会转义星号或要搜索的问号。
由于 SQL LIKE 运算符(其中通配符是 % 和 _)通常会提供反斜杠以进行转义,因此我采用了相同的方法。这使使用 Regex.Escape() 并将 * 替换为 .* 和 ?和 。使用正则表达式(请参阅该问题的许多其他答案)。
以下代码概述了为某些通配符模式提供正则表达式的方法。它是作为 C# 字符串的扩展方法实现的。文档标签和注释应该完整地解释代码:
using System.Text.RegularExpressions;
public static class MyStringExtensions
{
/// <summary>Interpret this string as wildcard pattern and create a corresponding regular expression.
/// Rules for simple wildcard matching are:
/// * Matches any character zero or more times.
/// ? Matches any character exactly one time.
/// \ Backslash can be used to escape above wildcards (and itself) for an explicit match,
/// e.g. \* would then match a single star, \? matches a question mark and \\ matches a backslash.
/// If \ is not followed by star, question mark or backslash it also matches a single backslash.
/// Character set matching (by use of rectangular braces []) is NOT used and regarded in this implementation.
/// </summary>
/// <param name="wildPat">This string to be used as wildcard match pattern.</param>
/// <param name="caseSens">Optional parameter for case sensitive matching - default is case insensitive.</param>
/// <returns>New instance of a regular expression performing the requested matches.
/// If input string is null or empty, null is returned.</returns>
public static Regex CreateWildcardRegEx(this string wildPat, bool caseSens = false)
{
if (string.IsNullOrEmpty(wildPat))
return null;
// 1. STEP: Escape all special characters used in Regex later to avoid unwanted behavior.
// Regex.Escape() prepends a backslash to any of following characters: \*+?|{[()^$.# and white space
wildPat = Regex.Escape(wildPat);
// 2. STEP: Replace all three possible occuring escape sequences defined for our
// wildcard pattern with temporary sub strings that CANNOT exist after 1. STEP anymore.
// Prepare some constant strings used below - @ in C# makes literal strings really literal - a backslash needs not be repeated!
const string esc = @"\\"; // Matches a backslash in a Regex
const string any = @"\*"; // Matches a star in a Regex
const string sgl = @"\?"; // Matches a question mark in a Regex
const string tmpEsc = @"||\"; // Instead of doubled | any character Regex.Escape() escapes would do (except \ itself!)
const string tmpAny = "||*"; // See comment above
const string tmpSgl = "||?"; // See comment above
// Watch that string.Replace() in C# will NOT stop replacing after the first match but continues instead...
wildPat = wildPat.Replace(Regex.Escape(esc), tmpEsc)
.Replace(Regex.Escape(any), tmpAny)
.Replace(Regex.Escape(sgl), tmpSgl);
// 3. STEP: Substitute our (in 1. STEP escaped) simple wildcards with the Regex counterparts.
const string regAny = ".*"; // Matches any character zero or more times in a Regex
wildPat = wildPat.Replace(any, regAny)
.Replace(sgl, "."); // . matches any character in a Regex
// 4. STEP: Revert the temporary replacements of 2. STEP (in reverse order) and replace with what a Regex really needs to match
wildPat = wildPat.Replace(tmpSgl, sgl)
.Replace(tmpAny, any)
.Replace(tmpEsc, esc);
// 5. STEP: (Optional, for performance) - Simplify multiply occuring * wildcards (cases of ******* or similar)
// Replace with the regAny string - Use a single Regex.Replace() instead of string.Contains() with string.Replace() in a while loop
wildPat = Regex.Replace(wildPat, @"(\.\*){2,}", regAny);
// 6. STEP: Finalize the Regex with begin and end line tags
return new Regex('^' + wildPat + '$', caseSens ? RegexOptions.None : RegexOptions.IgnoreCase);
// 2. and 4. STEP would be obsolete if we don't wanted to have the ability to escape * and ? characters for search
}
}
答案 2 :(得分:0)
试试这个RegEx:
^([\w,\s]*bl\we[\w,\s]*)
它基本上识别任何一组单词和空格,其中包含以“bl”开头并以“e”结尾的单词,其中包含一个字符。 或者
^([\w,\s]*bl(\w+)e[\w,\s]*)
如果你想识别任何以“bl”开头并以“e”结尾的单词。
另一种选择是在字符串上使用一些不精确的匹配算法。不确定这是否正是您正在寻找的。 p>
答案 3 :(得分:0)
就像参考一样......我实际上使用了那段代码:
UserBundle