Question

以下是访谈问题：

机器编码轮:(时间1小时）

表达式和字符串testCase，需要评估testCase是否对表达式有效

表达可能包含：


字母[a-z]

'.'（'.'代表[a-z]）
中的所有字符
'*'（'*'具有与普通RegExp相同的属性）

'^'（'^'表示字符串的开头）

'$'（'$'表示字符串的结尾）


示例案例：
Expression   Test Case   Valid
ab           ab          true 
a*b          aaaaaab     true 
a*b*c*       abc         true 
a*b*c        aaabccc     false 
^abc*b       abccccb     true 
^abc*b       abbccccb    false 
^abcd$       abcd        true 
^abc*abc$    abcabc      true 
^abc.abc$    abczabc     true 
^ab..*abc$   abyxxxxabc  true

我的方法：

将给定的正则表达式转换为连接（ab），更改（a|b），（a*）kleenstar。并添加+进行连接例如：
```
abc$  =>  .*+a+b+c
^ab..*abc$  => a+b+.+.*+a+b+c
```
根据优先级转换为后缀表示法（parantheses>kleen_star>concatenation>..）
```
(a|b)*+c  =>  ab|*c+
```
根据Thompson构建
通过维护一组状态来回溯/遍历NFA。

当我开始实施它时，花了我超过1小时。我觉得第3步非常耗时。我通过使用后缀表示法+堆栈以及根据需要添加新状态和转换来构建NFA。

所以，我想知道这个问题是否有更快的替代解决方案？或者更快的方式来实现第3步。我发现this CareerCup link有人在评论中提到它是来自一些编程竞赛。所以，如果有人先前解决了这个问题，或者对这个问题有更好的解决方案，我会很高兴知道我哪里出错了。

Answer 1

我想到了Levenshtein distance的一些推导 - 可能不是最快的算法，但应该快速实现。

我们可以在开始时忽略^，在结尾处忽略$ - 其他任何地方都无效。

然后我们构造一个2D网格，其中每一行代表表达式中的单位^[1]，每列代表测试字符串中的一个字符。

[1]：这里的“单位”是指单个字符，但*必须附加到前一个字符

因此对于a*b*c和aaabccc，我们会得到类似的内容：

   a a a b c c c
a*
b*
c

每个单元格都可以有一个表示有效性的布尔值。

现在，对于每个单元格，如果其中任何一个成立，则将其设置为有效：

左邻居中的值有效，行为x*或.*，列为x（x为任意字符{{1 }}）

这对应于a-z匹配一个额外字符。
左上角邻居的值有效，行为*或x，列为.（x为任意字符{ {1}}）

这对应于单字符匹配。
顶级邻居中的值有效，行为x或a-z。

这对应于x*无匹配。

然后检查最右下角的单元格是否有效。

因此，对于上面的例子，我们得到：（.*表示有效）

由于右下角的单元格无效，我们将返回无效。

运行时间：V。

你应该注意到我们主要是探索网格的一小部分。

这个解决方案可以通过使用memoization作为递归解决方案来改进（并且只是调用右下角单元的递归解决方案）。

这将为我们提供a a a b c c c a* V V V - - - - b* - - - V - - - c - - - - V - -的最佳效果，但仍然是O(stringLength*expressionLength)的最差情况。

我的解决方案假定表达式必须与整个字符串匹配，因为上述示例的结果推断无效（根据问题）。

如果它可以匹配子字符串，我们可以稍微修改一下，如果单元格位于顶行，则它在以下情况下有效：

该行为O(1)或O(stringLength*expressionLength)。
行为x*或.*，列为x。

Answer 2

只需1小时，我们就可以使用简单的方式。

将模式拆分为令牌：a*b.c =＆gt; { a* b . c }。

如果模式不是以^开头，则在开头添加.*，否则删除^。

如果模式没有以$结尾，那么最后添加.*，否则删除$。

然后我们使用递归：如果我们有重复模式（将模式索引增加1，将字索引增加1，将两个索引增加1），如果它不是重复模式（增加两者），则使用3路指数由1）。

C＃中的示例代码

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ReTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Debug.Assert(IsMatch("ab", "ab") == true);
            Debug.Assert(IsMatch("aaaaaab", "a*b") == true);
            Debug.Assert(IsMatch("abc", "a*b*c*") == true);
            Debug.Assert(IsMatch("aaabccc", "a*b*c") == true); /* original false, but it should be true */
            Debug.Assert(IsMatch("abccccb", "^abc*b") == true);
            Debug.Assert(IsMatch("abbccccb", "^abc*b") == false);
            Debug.Assert(IsMatch("abcd", "^abcd$") == true);
            Debug.Assert(IsMatch("abcabc", "^abc*abc$") == true);
            Debug.Assert(IsMatch("abczabc", "^abc.abc$") == true);
            Debug.Assert(IsMatch("abyxxxxabc", "^ab..*abc$") == true);
        }

        static bool IsMatch(string input, string pattern)
        {
            List<PatternToken> patternTokens = new List<PatternToken>();
            for (int i = 0; i < pattern.Length; i++)
            {
                char token = pattern[i];
                if (token == '^')
                {
                    if (i == 0)
                        patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
                    else
                        throw new ArgumentException("input");
                }
                else if (char.IsLower(token) || token == '.')
                {
                    if (i < pattern.Length - 1 && pattern[i + 1] == '*')
                    {
                        patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Multiple });
                        i++;
                    }
                    else
                        patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
                }
                else if (token == '$')
                {
                    if (i == pattern.Length - 1)
                        patternTokens.Add(new PatternToken { Token = token, Occurence = Occurence.Single });
                    else
                        throw new ArgumentException("input");
                }
                else
                    throw new ArgumentException("input");
            }

            PatternToken firstPatternToken = patternTokens.First();
            if (firstPatternToken.Token == '^')
                patternTokens.RemoveAt(0);
            else
                patternTokens.Insert(0, new PatternToken { Token = '.', Occurence = Occurence.Multiple });

            PatternToken lastPatternToken = patternTokens.Last();
            if (lastPatternToken.Token == '$')
                patternTokens.RemoveAt(patternTokens.Count - 1);
            else
                patternTokens.Add(new PatternToken { Token = '.', Occurence = Occurence.Multiple });

            return IsMatch(input, 0, patternTokens, 0);
        }

        static bool IsMatch(string input, int inputIndex, IList<PatternToken> pattern, int patternIndex)
        {
            if (inputIndex == input.Length)
            {
                if (patternIndex == pattern.Count || (patternIndex == pattern.Count - 1 && pattern[patternIndex].Occurence == Occurence.Multiple))
                    return true;
                else
                    return false;
            }
            else if (inputIndex < input.Length && patternIndex < pattern.Count)
            {
                char c = input[inputIndex];
                PatternToken patternToken = pattern[patternIndex];
                if (patternToken.Token == '.' || patternToken.Token == c)
                {
                    if (patternToken.Occurence == Occurence.Single)
                        return IsMatch(input, inputIndex + 1, pattern, patternIndex + 1);
                    else
                        return IsMatch(input, inputIndex, pattern, patternIndex + 1) ||
                               IsMatch(input, inputIndex + 1, pattern, patternIndex) ||
                               IsMatch(input, inputIndex + 1, pattern, patternIndex + 1);
                }
                else
                    return false;
            }
            else
                return false;
        }

        class PatternToken
        {
            public char Token { get; set; }
            public Occurence Occurence { get; set; }

            public override string ToString()
            {
                if (Occurence == Occurence.Single)
                    return Token.ToString();
                else
                    return Token.ToString() + "*";
            }
        }

        enum Occurence
        {
            Single,
            Multiple
        }
    }
}

Answer 3

这是Java中的解决方案。空间和时间是O（n）。提供内联注释以便更清晰：

/**
 * @author Santhosh Kumar
 *
 */
public class ExpressionProblemSolution {

public static void main(String[] args) {
    System.out.println("---------- ExpressionProblemSolution - start ---------- \n");
    ExpressionProblemSolution evs = new ExpressionProblemSolution();
    evs.runMatchTests();
    System.out.println("\n---------- ExpressionProblemSolution - end ---------- ");
}

// simple node structure to keep expression terms
class Node {
    Character ch; // char [a-z]
    Character sch; // special char (^, *, $, .)
    Node next;

    Node(Character ch1, Character sch1) {
        ch = ch1;
        sch = sch1;
    }

    Node add(Character ch1, Character sch1) {
        this.next = new Node(ch1, sch1);
        return this.next;
    }

    Node next() {
        return this.next;
    }

    public String toString() {
        return "[ch=" + ch + ", sch=" + sch + "]";
    }
}

private boolean letters(char ch) {
    return (ch >= 'a' && ch <= 'z');
}

private boolean specialChars(char ch) {
    return (ch == '.' || ch == '^' || ch == '*' || ch == '$');
}

private void validate(String expression) {
    // if expression has invalid chars throw runtime exception
    if (expression == null) {
        throw new RuntimeException(
                "Expression can't be null, but it can be empty");
    }
    char[] expr = expression.toCharArray();
    for (int i = 0; i < expr.length; i++) {
        if (!letters(expr[i]) && !specialChars(expr[i])) {
            throw new RuntimeException(
                    "Expression contains invalid char at position=" + i
                            + ", invalid_char=" + expr[i]
                            + " (allowed chars are 'a-z', *, . ^, * and $)");
        }
    }
}

// Parse the expression and split them into terms and add to list
// the list is FSM (Finite State Machine). The list is used during
// the process step to iterate through the machine states based 
// on the input string
// 
// expression = a*b*c has 3 terms -> [a*] [b*] [c] 
// expression = ^ab.*c$ has 4 terms -> [^a] [b] [.*] [c$]   
//
// Timing : O(n)    n -> expression length
// Space :  O(n)    n -> expression length decides the no.of terms stored in the list
private Node preprocess(String expression) {
    debug("preprocess - start [" + expression + "]");
    validate(expression);
    Node root = new Node(' ', ' '); // root node with empty values
    Node current = root;
    char[] expr = expression.toCharArray();
    int i = 0, n = expr.length;

    while (i < n) {
        debug("i=" + i);
        if (expr[i] == '^') { // it is prefix operator, so it always linked
                                // to the char after that
            if (i + 1 < n) {
                if (i == 0) { // ^ indicates start of the expression, so it
                                // must be first in the expr string
                    current = current.add(expr[i + 1], expr[i]);
                    i += 2;
                    continue;
                } else {
                    throw new RuntimeException(
                            "Special char ^ should be present only at the first position of the expression (position="
                                    + i + ", char=" + expr[i] + ")");
                }
            } else {
                throw new RuntimeException(
                        "Expression missing after ^ (position=" + i
                                + ", char=" + expr[i] + ")");
            }
        } else if (letters(expr[i]) || expr[i] == '.') { // [a-z] or .
            if (i + 1 < n) {
                char nextCh = expr[i + 1];
                if (nextCh == '$' && i + 1 != n - 1) { // if $, then it must
                                                        // be at the last
                                                        // position of the
                                                        // expression
                    throw new RuntimeException(
                            "Special char $ should be present only at the last position of the expression (position="
                                    + (i + 1)
                                    + ", char="
                                    + expr[i + 1]
                                    + ")");
                }
                if (nextCh == '$' || nextCh == '*') { // a* or b$
                    current = current.add(expr[i], nextCh);
                    i += 2;
                    continue;
                } else {
                    current = current.add(expr[i], expr[i] == '.' ? expr[i]
                            : null);
                    i++;
                    continue;
                }
            } else { // a or b
                current = current.add(expr[i], null);
                i++;
                continue;
            }
        } else {
            throw new RuntimeException("Invalid char - (position=" + (i)
                    + ", char=" + expr[i] + ")");
        }
    }

    debug("preprocess - end");
    return root;
}

// Traverse over the terms in the list and iterate and match the input string
// The terms list is the FSM (Finite State Machine); the end of list indicates
// end state. That is, input is valid and matching the expression
//
// Timing : O(n) for pre-processing + O(n) for processing = 2O(n) = ~O(n) where n -> expression length
// Timing : O(2n) ~ O(n)
// Space :  O(n)    where n -> expression length decides the no.of terms stored in the list
public boolean process(String expression, String testString) {
    Node root = preprocess(expression);
    print(root);
    Node current = root.next();
    if (root == null || current == null)
        return false;
    int i = 0;
    int n = testString.length();
    debug("input-string-length=" + n);
    char[] test = testString.toCharArray();
    // while (i < n && current != null) {
    while (current != null) {
        debug("process: i=" + i);
        debug("process: ch=" + current.ch + ", sch=" + current.sch);
        if (current.sch == null) { // no special char just [a-z] case
            if (test[i] != current.ch) { // test char and current state char
                                            // should match
                return false;
            } else {
                i++;
                current = current.next();
                continue;
            }
        } else if (current.sch == '^') { // process start char
            if (i == 0 && test[i] == current.ch) {
                i++;
                current = current.next();
                continue;
            } else {
                return false;
            }

        } else if (current.sch == '$') { // process end char
            if (i == n - 1 && test[i] == current.ch) {
                i++;
                current = current.next();
                continue;
            } else {
                return false;
            }

        } else if (current.sch == '*') { // process repeat char
            if (letters(current.ch)) { // like a* or b*
                while (i < n && test[i] == current.ch)
                    i++; // move i till end of repeat char
                current = current.next();
                continue;
            } else if (current.ch == '.') { // like .*
                Node nextNode = current.next();
                print(nextNode);
                if (nextNode != null) {
                    Character nextChar = nextNode.ch;
                    Character nextSChar = nextNode.sch;
                    // a.*z = az or (you need to check the next state in the
                    // list)
                    if (test[i] == nextChar) { // test [i] == 'z'
                        i++;
                        current = current.next();
                        continue;
                    } else {
                        // a.*z = abz or
                        // a.*z = abbz
                        char tch = test[i]; // get 'b'
                        while (i + 1 < n && test[++i] == tch)
                            ; // move i till end of repeat char
                        current = current.next();
                        continue;
                    }
                }
            } else { // like $* or ^*
                debug("process: return false-1");
                return false;
            }

        } else if (current.sch == '.') { // process any char
            if (!letters(test[i])) {
                return false;
            }
            i++;
            current = current.next();
            continue;
        }
    }

    if (i == n && current == null) {
        // string position is out of bound
        // list is at end ie. exhausted both expression and input
        // FSM reached the end state, hence the input is valid and matches the given expression 
        return true;
    } else {
        return false;
    }
}

public void debug(Object str) {
    boolean debug = false;
    if (debug) {
        System.out.println("[debug] " + str);
    }
}

private void print(Node node) {
    StringBuilder sb = new StringBuilder();
    while (node != null) {
        sb.append(node + " ");
        node = node.next();
    }
    sb.append("\n");
    debug(sb.toString());
}

public boolean match(String expr, String input) {
    boolean result = process(expr, input);
    System.out.printf("\n%-20s %-20s %-20s\n", expr, input, result);
    return result;
}

public void runMatchTests() {
    match("ab", "ab");
    match("a*b", "aaaaaab");
    match("a*b*c*", "abc");
    match("a*b*c", "aaabccc");
    match("^abc*b", "abccccb");
    match("^abc*b", "abccccbb");
    match("^abcd$", "abcd");
    match("^abc*abc$", "abcabc");
    match("^abc.abc$", "abczabc");
    match("^ab..*abc$", "abyxxxxabc");
    match("a*b*", ""); // handles empty input string
    match("xyza*b*", "xyz");
}}

Answer 4

 int regex_validate(char *reg, char *test) {
        char *ptr = reg;

        while (*test) {
                switch(*ptr) {
                        case '.':
                        {
                                test++; ptr++; continue;
                                break;
                        }
                        case '*':
                        {
                                if (*(ptr-1) == *test) {
                                        test++; continue;
                                }
                                else if (*(ptr-1) == '.' && (*test == *(test-1))) {
                                        test++; continue;
                                }
                                else {
                                        ptr++; continue;
                                }
                                break;
                        }
                      case '^':
                        {
                                ptr++;

                                while ( ptr && test && *ptr == *test) {
                                        ptr++; test++;
                                }
                                if (!ptr && !test)
                                        return 1;
                                if (ptr && test && (*ptr == '$' || *ptr == '*' || *ptr == '.')) {
                                         continue;
                                }
                                else {
                                        return 0;
                                }
                                break;
                        }
                        case '$':
                        {
                                if (*test)
                                        return 0;
                                break;
                        }
                        default:
                        {
                                printf("default case.\n");
                                if (*ptr != *test) {
                                        return 0;
                                }
                                test++; ptr++; continue;
                        }
                        break;
                }
        }
        return 1;
}

int main () {
        printf("regex=%d\n", regex_validate("ab", "ab"));
        printf("regex=%d\n", regex_validate("a*b", "aaaaaab"));
        printf("regex=%d\n", regex_validate("^abc.abc$", "abcdabc"));
        printf("regex=%d\n", regex_validate("^abc*abc$", "abcabc"));
        printf("regex=%d\n", regex_validate("^abc*b", "abccccb"));
        printf("regex=%d\n", regex_validate("^abc*b", "abbccccb"));
        return 0;
}

访谈：机器编码/正则表达式（更好地替代我的解决方案）

4 个答案: