需要使用Regex对字符串执行通配符(*,?等)搜索

时间:2011-08-02 05:40:31

标签: c# .net regex string wildcard

我需要执行通配符(*?等)搜索字符串。 这就是我所做的:

string input = "Message";
string pattern = "d*";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

if (regex.IsMatch(input))
{
    MessageBox.Show("Found");
}
else
{
    MessageBox.Show("Not Found");
}

使用上面的代码“Found”块正在命中,但实际上它不应该!

如果我的模式是“e *”,那么只有“找到”才能命中。

我的理解或要求是d * search应该找到包含“d”的文本,后跟任何字符。

我应该将模式更改为“d。*”和“e。*”吗? .NET for Wild Card是否支持在使用Regex类时在内部执行?

11 个答案:

答案 0 :(得分:114)

来自http://www.codeproject.com/KB/recipes/wildcardtoregex.aspx

public static string WildcardToRegex(string pattern)
{
    return "^" + Regex.Escape(pattern)
                      .Replace(@"\*", ".*")
                      .Replace(@"\?", ".")
               + "$";
}

foo*.xls?之类的内容会转换为^foo.*\.xls.$

答案 1 :(得分:21)

您可以使用名为LikeString的Visual Basic函数在没有RegEx的情况下执行简单的通配符。

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;

if (Operators.LikeString("This is just a test", "*just*", CompareMethod.Text))
{
  Console.WriteLine("This matched!");
}

如果您使用CompareMethod.Text,它将比较不区分大小写。对于区分大小写的比较,您可以使用CompareMethod.Binary

此处有更多信息:http://www.henrikbrinch.dk/Blog/2012/02/14/Wildcard-matching-in-C

MSDN:http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.compilerservices.operators.likestring%28v=vs.100%29.ASPX

答案 2 :(得分:10)

glob表达式d*的正确正则表达式为^d,表示匹配以d开头的任何内容。

    string input = "Message";
    string pattern = @"^d";
    Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

(在这种情况下,@引用不是必需的,但是很好的做法,因为许多正则表达式都使用需要单独保留的反斜杠转义符,并且它还向读者表明此字符串是特殊的。)

答案 3 :(得分:6)

Windows和* nux以不同方式处理通配符。 Windows *?.以非常复杂的方式处理,一个人的存在或位置会改变另一个人的意思。虽然* nux保持简单,但它只是一个简单的模式匹配。除此之外,Windows将?与0或1个字符匹配,Linux将其与1个字符匹配。

我没有找到关于这个问题的权威文档,这里只是基于Windows 8 / XP测试天数的结论(命令行,dir命令是特定的,Directory.GetFiles方法也使用相同的规则)和Ubuntu Server 12.04.1(ls命令)。我做了几十个常见和不常见的案例,虽然也有很多失败的案例。

Gabe目前的回答,就像* nux一样。如果你也想要一个Windows风格的,并且愿意接受这个不完美的东西,那么它就是:

    /// <summary>
    /// <para>Tests if a file name matches the given wildcard pattern, uses the same rule as shell commands.</para>
    /// </summary>
    /// <param name="fileName">The file name to test, without folder.</param>
    /// <param name="pattern">A wildcard pattern which can use char * to match any amount of characters; or char ? to match one character.</param>
    /// <param name="unixStyle">If true, use the *nix style wildcard rules; otherwise use windows style rules.</param>
    /// <returns>true if the file name matches the pattern, false otherwise.</returns>
    public static bool MatchesWildcard(this string fileName, string pattern, bool unixStyle)
    {
        if (fileName == null)
            throw new ArgumentNullException("fileName");

        if (pattern == null)
            throw new ArgumentNullException("pattern");

        if (unixStyle)
            return WildcardMatchesUnixStyle(pattern, fileName);

        return WildcardMatchesWindowsStyle(fileName, pattern);
    }

    private static bool WildcardMatchesWindowsStyle(string fileName, string pattern)
    {
        var dotdot = pattern.IndexOf("..", StringComparison.Ordinal);
        if (dotdot >= 0)
        {
            for (var i = dotdot; i < pattern.Length; i++)
                if (pattern[i] != '.')
                    return false;
        }

        var normalized = Regex.Replace(pattern, @"\.+$", "");
        var endsWithDot = normalized.Length != pattern.Length;

        var endWeight = 0;
        if (endsWithDot)
        {
            var lastNonWildcard = normalized.Length - 1;
            for (; lastNonWildcard >= 0; lastNonWildcard--)
            {
                var c = normalized[lastNonWildcard];
                if (c == '*')
                    endWeight += short.MaxValue;
                else if (c == '?')
                    endWeight += 1;
                else
                    break;
            }

            if (endWeight > 0)
                normalized = normalized.Substring(0, lastNonWildcard + 1);
        }

        var endsWithWildcardDot = endWeight > 0;
        var endsWithDotWildcardDot = endsWithWildcardDot && normalized.EndsWith(".");
        if (endsWithDotWildcardDot)
            normalized = normalized.Substring(0, normalized.Length - 1);

        normalized = Regex.Replace(normalized, @"(?!^)(\.\*)+$", @".*");

        var escaped = Regex.Escape(normalized);
        string head, tail;

        if (endsWithDotWildcardDot)
        {
            head = "^" + escaped;
            tail = @"(\.[^.]{0," + endWeight + "})?$";
        }
        else if (endsWithWildcardDot)
        {
            head = "^" + escaped;
            tail = "[^.]{0," + endWeight + "}$";
        }
        else
        {
            head = "^" + escaped;
            tail = "$";
        }

        if (head.EndsWith(@"\.\*") && head.Length > 5)
        {
            head = head.Substring(0, head.Length - 4);
            tail = @"(\..*)?" + tail;
        }

        var regex = head.Replace(@"\*", ".*").Replace(@"\?", "[^.]?") + tail;
        return Regex.IsMatch(fileName, regex, RegexOptions.IgnoreCase);
    }

    private static bool WildcardMatchesUnixStyle(string pattern, string text)
    {
        var regex = "^" + Regex.Escape(pattern)
                               .Replace("\\*", ".*")
                               .Replace("\\?", ".")
                    + "$";

        return Regex.IsMatch(text, regex);
    }

有一个有趣的事情,即使Windows API PathMatchSpec也不同意FindFirstFile。只需尝试a1*.FindFirstFile说匹配a1PathMatchSpec说不匹配。

答案 4 :(得分:3)

d*表示它应匹配零个或多个“d”字符。所以任何字符串都是有效的匹配。请改为d+

为了支持通配符模式,我会用RegEx等价物替换通配符。与*成为.*?成为.?。然后,您上面的表达式变为d.*

答案 5 :(得分:3)

您需要将通配符表达式转换为正则表达式。例如:

    private bool WildcardMatch(String s, String wildcard, bool case_sensitive)
    {
        // Replace the * with an .* and the ? with a dot. Put ^ at the
        // beginning and a $ at the end
        String pattern = "^" + Regex.Escape(wildcard).Replace(@"\*", ".*").Replace(@"\?", ".") + "$";

        // Now, run the Regex as you already know
        Regex regex;
        if(case_sensitive)
            regex = new Regex(pattern);
        else
            regex = new Regex(pattern, RegexOptions.IgnoreCase);

        return(regex.IsMatch(s));
    } 

答案 6 :(得分:3)

您必须在输入通配符模式中转义特殊的正则表达式符号(例如,模式*.txt将等同于^.*\.txt$) 因此,斜杠,大括号和许多特殊符号必须替换为@"\" + s,其中s - 特殊正则表达式符号。

答案 7 :(得分:0)

所有高级代码都不正确。

这是因为在搜索zz * foo *或zz *时,您将无法获得正确的结果。

如果你在TotalCommander的“abcd”中搜索“abcd *”,他会找到一个abcd文件,所以所有高级代码都是错误的。

这是正确的代码。

public string WildcardToRegex(string pattern)
{             
    string result= Regex.Escape(pattern).
        Replace(@"\*", ".+?").
        Replace(@"\?", "."); 

    if (result.EndsWith(".+?"))
    {
        result = result.Remove(result.Length - 3, 3);
        result += ".*";
    }

    return result;
}

答案 8 :(得分:0)

您可能希望使用System.Management.Automation程序集中的WildcardPattern。请参阅我的回答here

答案 9 :(得分:0)

我认为@Dmitri有很好的解决方案 使用通配符匹配字符串 https://stackoverflow.com/a/30300521/1726296

根据他的解决方案,我创建了两个扩展方法。 (功劳归于他)

可能会有帮助。

for index,row in df.iterrows():
        multiplier = (np.random.randint(1,5+1))
        row['Impresions']*=multiplier
        row['Clicks']*=(np.random.randint(1,multiplier+1))
        row['Ctr']= float(row['Clicks'])/float(row['Impresions'])
        row['Mult']=multiplier
        #print (row['Clicks'],row['Impresions'],row['Ctr'],row['Mult'])

用法:

public static String WildCardToRegular(this String value)
{
        return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}

public static bool WildCardMatch(this String value,string pattern,bool ignoreCase = true)
{
        if (ignoreCase)
            return Regex.IsMatch(value, WildCardToRegular(pattern), RegexOptions.IgnoreCase);

        return Regex.IsMatch(value, WildCardToRegular(pattern));
}

string pattern = "file.*";

var isMatched = "file.doc".WildCardMatch(pattern)

答案 10 :(得分:0)

public class Wildcard
{
    private readonly string _pattern;

    public Wildcard(string pattern)
    {
        _pattern = pattern;
    }

    public static bool Match(string value, string pattern)
    {
        int start = -1;
        int end = -1;
        return Match(value, pattern, ref start, ref end);
    }

    public static bool Match(string value, string pattern, char[] toLowerTable)
    {
        int start = -1;
        int end = -1;
        return Match(value, pattern, ref start, ref end, toLowerTable);
    }

    public static bool Match(string value, string pattern, ref int start, ref int end)
    {
        return new Wildcard(pattern).IsMatch(value, ref start, ref end);
    }

    public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
    {
        return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
    }

    public bool IsMatch(string str)
    {
        int start = -1;
        int end = -1;
        return IsMatch(str, ref start, ref end);
    }

    public bool IsMatch(string str, char[] toLowerTable)
    {
        int start = -1;
        int end = -1;
        return IsMatch(str, ref start, ref end, toLowerTable);
    }

    public bool IsMatch(string str, ref int start, ref int end)
    {
        if (_pattern.Length == 0) return false;
        int pindex = 0;
        int sindex = 0;
        int pattern_len = _pattern.Length;
        int str_len = str.Length;
        start = -1;
        while (true)
        {
            bool star = false;
            if (_pattern[pindex] == '*')
            {
                star = true;
                do
                {
                    pindex++;
                }
                while (pindex < pattern_len && _pattern[pindex] == '*');
            }
            end = sindex;
            int i;
            while (true)
            {
                int si = 0;
                bool breakLoops = false;
                for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
                {
                    si = sindex + i;
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (str[si] == _pattern[pindex + i])
                    {
                        continue;
                    }
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (_pattern[pindex + i] == '?' && str[si] != '.')
                    {
                        continue;
                    }
                    breakLoops = true;
                    break;
                }
                if (breakLoops)
                {
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    if (si == str_len)
                    {
                        return false;
                    }
                }
                else
                {
                    if (start == -1)
                    {
                        start = sindex;
                    }
                    if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
                    {
                        break;
                    }
                    if (sindex + i == str_len)
                    {
                        if (end <= start)
                        {
                            end = str_len;
                        }
                        return true;
                    }
                    if (i != 0 && _pattern[pindex + i - 1] == '*')
                    {
                        return true;
                    }
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                }
            }
            sindex += i;
            pindex += i;
            if (start == -1)
            {
                start = sindex;
            }
        }
    }

    public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
    {
        if (_pattern.Length == 0) return false;

        int pindex = 0;
        int sindex = 0;
        int pattern_len = _pattern.Length;
        int str_len = str.Length;
        start = -1;
        while (true)
        {
            bool star = false;
            if (_pattern[pindex] == '*')
            {
                star = true;
                do
                {
                    pindex++;
                }
                while (pindex < pattern_len && _pattern[pindex] == '*');
            }
            end = sindex;
            int i;
            while (true)
            {
                int si = 0;
                bool breakLoops = false;

                for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
                {
                    si = sindex + i;
                    if (si == str_len)
                    {
                        return false;
                    }
                    char c = toLowerTable[str[si]];
                    if (c == _pattern[pindex + i])
                    {
                        continue;
                    }
                    if (si == str_len)
                    {
                        return false;
                    }
                    if (_pattern[pindex + i] == '?' && c != '.')
                    {
                        continue;
                    }
                    breakLoops = true;
                    break;
                }
                if (breakLoops)
                {
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    if (si == str_len)
                    {
                        return false;
                    }
                }
                else
                {
                    if (start == -1)
                    {
                        start = sindex;
                    }
                    if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
                    {
                        break;
                    }
                    if (sindex + i == str_len)
                    {
                        if (end <= start)
                        {
                            end = str_len;
                        }
                        return true;
                    }
                    if (i != 0 && _pattern[pindex + i - 1] == '*')
                    {
                        return true;
                    }
                    if (!star)
                    {
                        return false;
                    }
                    sindex++;
                    continue;
                }
            }
            sindex += i;
            pindex += i;
            if (start == -1)
            {
                start = sindex;
            }
        }
    }
}

使用:

var ismatch2 = Wildcard.Match("xmlfsdfsd", "xml*");