使用正则表达式过滤字符串列表但使用通配符(*和?)?

时间:2012-05-11 20:14:07

标签: c# regex

我想知道是否可以修改使用*?的通配符表达式将其转换为正则表达式,以验证它是否与某些字符串匹配。

换句话说,如果我对这些字符串使用过滤器(不区分大小写):*bl?e*

["Blue", "Black", "Red", "Light blue", "Light black"]

我想得到:

["Blue, "Light blue"].

有人知道如何用正则表达式做到这一点吗? 除了使用正则表达式之外,还有更好的方法吗?

  

添加以更好地澄清我的想法...

确定! ......一如既往,我想我会问一个非常明确的问题,并通过答案意识到我完全搞砸了我的问题。我想做一个函数,根据表达式(作为我的函数的参数)过滤集合,与dos('*''?')相同的规则。我认为使用正则表达式会是一个好主意。我是对的,什么是正则表达式?另外......我正在使用C#而且我想知道我是否无法访问任何可以直接执行此操作的内容?

我也看(非常好的答案)How do I specify a wildcard (for ANY character) in a c# regex statement?

我终于在.net Patterns and Practices库中使用了Glob类。

但作为参考,这是我将Glob exp转换为RegEx的代码:

using System.Text;
using System.Text.RegularExpressions;

namespace HQ.Util.General
{
    public class RegexUtil
    {
        public const string RegExMetaChars = @"*?(){}[]+-^$.|\"; // Do not change the order. Algo depends on it (2 first chars should be dos like wildcard char)

        // ******************************************************************
        /// <summary>
        /// Convert an filter expression with '*' (wildcard any char) and '?' (wildcard on char) into a valid regex and
        /// strip any special regex character
        /// </summary>
        /// <param name="dosLikeExpressionFilter"></param>
        /// <returns></returns>
        public static string DosLikeExpressionFilterToRegExFilterExpression(string dosLikeExpressionFilter)
        {
            StringBuilder regex = new StringBuilder();
            regex.Append("(?i)"); // Case insensitive

            int startIndex = 0;
            int count = dosLikeExpressionFilter.Length;
            while (startIndex < count)
            {
                int metaIndex = RegExMetaChars.IndexOf(dosLikeExpressionFilter[startIndex]);
                if (metaIndex >= 0)
                {
                    if (metaIndex == 0)
                    {
                        regex.Append(".*");
                    }
                    else if (metaIndex == 1)
                    {
                        regex.Append(".");
                    }
                    else
                    {
                        regex.Append("\\");
                        regex.Append(dosLikeExpressionFilter[startIndex]);
                    }
                }
                else
                {
                    regex.Append(dosLikeExpressionFilter[startIndex]);
                }
                startIndex++;
            }

            return regex.ToString();
        }

        // ******************************************************************
        /// <summary>
        /// See 'DosLikeExpressionFilterToRegExFilterExpression' description to see what type of Regex is returned
        /// </summary>
        /// <param name="dosLikeExpressionFilter"></param>
        /// <returns></returns>
        public static Regex DosLikeExpressionFilterToRegEx(string dosLikeExpressionFilter)
        {
            return new Regex(DosLikeExpressionFilterToRegExFilterExpression(dosLikeExpressionFilter));
        }

        // ******************************************************************
    }
}

4 个答案:

答案 0 :(得分:2)

               Any single character    Any number of characters   Character range
Glob syntax            ?                           *                    [0-9]
Regex syntax           .                           .*                   [0-9]

因此Bl?e(glob)变为Bl.e(正则表达式),*Bl?e*变为.*Bl.e.*

正如Joey正确指出的那样,你可以(通常,取决于正则表达式引擎)将(?i)添加到正则表达式中以使其不区分大小写。

请注意,许多在globbing模式中没有特殊含义的字符在正则表达式中具有特殊含义,因此您不能只从glob到regex进行简单的搜索和替换。

答案 1 :(得分:1)

需要解决相同的问题(使用用户输入中的 * 和 ? 通配符模式来过滤任意字符串列表),但扩展名可能会转义星号或要搜索的问号。

由于 SQL LIKE 运算符(其中通配符是 % 和 _)通常会提供反斜杠以进行转义,因此我采用了相同的方法。这使使用 Regex.Escape() 并将 * 替换为 .* 和 ?和 。使用正则表达式(请参阅该问题的许多其他答案)。

以下代码概述了为某些通配符模式提供正则表达式的方法。它是作为 C# 字符串的扩展方法实现的。文档标签和注释应该完整地解释代码:

using System.Text.RegularExpressions;

public static class MyStringExtensions
{
    /// <summary>Interpret this string as wildcard pattern and create a corresponding regular expression. 
    /// Rules for simple wildcard matching are:
    /// * Matches any character zero or more times.
    /// ? Matches any character exactly one time.
    /// \ Backslash can be used to escape above wildcards (and itself) for an explicit match,
    /// e.g. \* would then match a single star, \? matches a question mark and \\ matches a backslash.
    /// If \ is not followed by star, question mark or backslash it also matches a single backslash.
    /// Character set matching (by use of rectangular braces []) is NOT used and regarded in this implementation.
    /// </summary>
    /// <param name="wildPat">This string to be used as wildcard match pattern.</param>
    /// <param name="caseSens">Optional parameter for case sensitive matching - default is case insensitive.</param>
    /// <returns>New instance of a regular expression performing the requested matches.
    /// If input string is null or empty, null is returned.</returns>
    public static Regex CreateWildcardRegEx(this string wildPat, bool caseSens = false)
    {
        if (string.IsNullOrEmpty(wildPat))
           return null;

        // 1. STEP: Escape all special characters used in Regex later to avoid unwanted behavior.
        // Regex.Escape() prepends a backslash to any of following characters: \*+?|{[()^$.# and white space 
        wildPat = Regex.Escape(wildPat);

        // 2. STEP: Replace all three possible occuring escape sequences defined for our 
        // wildcard pattern with temporary sub strings that CANNOT exist after 1. STEP anymore.
        // Prepare some constant strings used below - @ in C# makes literal strings really literal - a backslash needs not be repeated!
        const string esc    = @"\\";    // Matches a backslash in a Regex
        const string any    = @"\*";    // Matches a star in a Regex
        const string sgl    = @"\?";    // Matches a question mark in a Regex
        const string tmpEsc = @"||\";   // Instead of doubled | any character Regex.Escape() escapes would do (except \ itself!)
        const string tmpAny =  "||*";   // See comment above
        const string tmpSgl =  "||?";   // See comment above

        // Watch that string.Replace() in C# will NOT stop replacing after the first match but continues instead...
        wildPat = wildPat.Replace(Regex.Escape(esc), tmpEsc)
                         .Replace(Regex.Escape(any), tmpAny)
                         .Replace(Regex.Escape(sgl), tmpSgl);

        // 3. STEP: Substitute our (in 1. STEP escaped) simple wildcards with the Regex counterparts.
        const string regAny = ".*";             // Matches any character zero or more times in a Regex
        wildPat = wildPat.Replace(any, regAny)
                         .Replace(sgl, ".");    // . matches any character in a Regex

        // 4. STEP: Revert the temporary replacements of 2. STEP (in reverse order) and replace with what a Regex really needs to match
        wildPat = wildPat.Replace(tmpSgl, sgl)
                         .Replace(tmpAny, any)
                         .Replace(tmpEsc, esc);

        // 5. STEP: (Optional, for performance) - Simplify multiply occuring * wildcards (cases of ******* or similar)
        // Replace with the regAny string - Use a single Regex.Replace() instead of string.Contains() with string.Replace() in a while loop 
        wildPat = Regex.Replace(wildPat, @"(\.\*){2,}", regAny);

        // 6. STEP: Finalize the Regex with begin and end line tags
        return new Regex('^' + wildPat + '$', caseSens ? RegexOptions.None : RegexOptions.IgnoreCase);

        // 2. and 4. STEP would be obsolete if we don't wanted to have the ability to escape * and ? characters for search
    }
}

答案 2 :(得分:0)

试试这个RegEx:

^([\w,\s]*bl\we[\w,\s]*) 

它基本上识别任何一组单词和空格,其中包含以“bl”开头并以“e”结尾的单词,其中包含一个字符。 或者

^([\w,\s]*bl(\w+)e[\w,\s]*)

如果你想识别任何以“bl”开头并以“e”结尾的单词。

另一种选择是在字符串上使用一些不精确的匹配算法。不确定这是否正是您正在寻找的。

答案 3 :(得分:0)

就像参考一样......我实际上使用了那段代码:

UserBundle