文本字符串中的通配符匹配

时间:2014-01-28 15:20:36

标签: regex algorithm wildcard string-matching

我的朋友给这张外卡(*)匹配算法。这是代码。

//This function compares text strings, one of which can have wildcards ('*').
//
BOOL GeneralTextCompare(
        char * pTameText,             // A string without wildcards
        char * pWildText,             // A (potentially) corresponding string with wildcards
        BOOL bCaseSensitive = FALSE,  // By default, match on 'X' vs 'x'
        char cAltTerminator = '\0'    // For function names, for example, you can stop at the first '('
)
{
        BOOL bMatch = TRUE;
        char * pAfterLastWild = NULL; // The location after the last '*', if we’ve encountered one
        char * pAfterLastTame = NULL; // The location in the tame string, from which we started after last wildcard
        char t, w;

        // Walk the text strings one character at a time.
        while (1)
        {
                t = *pTameText;
                w = *pWildText;

                // How do you match a unique text string?
                if (!t || t == cAltTerminator)
                {
                        // Easy: unique up on it!
                        if (!w || w == cAltTerminator)
                        {
                                break;                                   // "x" matches "x"
                        }
                        else if (w == '*')
                        {
                                pWildText++;
                                continue;                           // "x*" matches "x" or "xy"
                        }
                        else if (pAfterLastTame)
                        {
                                if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
                                {
                                        bMatch = FALSE;
                                        break;
                                }
                                pTameText = pAfterLastTame++;
                                pWildText = pAfterLastWild;
                                continue;
                        }

                        bMatch = FALSE;
                        break;                                           // "x" doesn't match "xy"
                }
                else
                {
                        if (!bCaseSensitive)
                        {
                                // Lowercase the characters to be compared.
                                if (t >= 'A' && t <= 'Z')
                                {
                                        t += ('a' - 'A');
                                }

                                if (w >= 'A' && w <= 'Z')
                                {
                                        w += ('a' - 'A');
                                }
                        }

                        // How do you match a tame text string?
                        if (t != w)
                        {
                                // The tame way: unique up on it!
                                if (w == '*')
                                {
                                        pAfterLastWild = ++pWildText;
                                        pAfterLastTame = pTameText;
                                        w = *pWildText;

                                        if (!w || w == cAltTerminator)
                                        {
                                                break;                           // "*" matches "x"
                                        }
                                        continue;                           // "*y" matches "xy"
                                }
                                else if (pAfterLastWild)
                                {
                                        if (pAfterLastWild != pWildText)
                                        {
                                                pWildText = pAfterLastWild;
                                                w = *pWildText;

                                                if (!bCaseSensitive && w >= 'A' && w <= 'Z')
                                                {
                                                        w += ('a' - 'A');
                                                }

                                                if (t == w)
                                                {
                                                        pWildText++;
                                                }
                                        }
                                        pTameText++;
                                        continue;                           // "*sip*" matches "mississippi"
                                }
                                else
                                {
                                        bMatch = FALSE;
                                        break;                                   // "x" doesn't match "y"
                                }
                        }
                }

                pTameText++;
                pWildText++;
        }

        return bMatch;
}

这个算法如下(据我所知)

mississippi     *sip*
mississippi     sip*
ississippi      sip*
ssissippi       sip*
sissippi        ip*
sissippi        sip* pAfterLastWild is used to restore the location 
issippi         ip*
ssippi          p*
ssippi          sip* again pAfterLastWild is used here.
sippi           ip*
sippi           sip* here also.
ippi            ip*
ppi             p*
pi              *
i               *

我无法弄清楚为什么需要 pAfterLastTame 以及这段代码在这里做了什么,因为我无法使用它。

else if (pAfterLastTame)
                        {
                                if (!(*pAfterLastTame) || *pAfterLastTame == cAltTerminator)
                                {
                                        bMatch = FALSE;
                                        break;
                                }
                                pTameText = pAfterLastTame++;
                                pWildText = pAfterLastWild;
                                continue;
                        }

这个算法非常快,因为比较次数等于tameString的大小(纠正我错了)。
有没有人知道比这更有效的算法?

0 个答案:

没有答案