检查给定的字符串是否遵循给定的模式

时间:2014-11-02 18:24:16

标签: string algorithm dynamic-programming graph-algorithm

我的一位朋友刚刚在谷歌接受采访并被拒绝,因为他无法解决这个问题。

我在几天内接受了采访,似乎无法找到解决问题的方法。

以下是问题:

  

您将获得一个模式,例如[a b a b]。你也得到了一个   string,example" redblueredblue"。我需要写一个讲述的程序   字符串是否遵循给定的模式。

     

一些例子:

     

模式:[a b b a]字符串:catdogdogcat返回1

     

模式:[a b a b]字符串:redblueredblue返回1

     

模式:[a b b a]字符串:redblueredblue返回0

我想到了一些方法,比如获取模式中唯一字符的数量,然后找到字符串的许多唯一子字符串,然后使用hashmap与模式进行比较。但是,如果a的子串是b的一部分,则证明是一个问题。

如果你们中的任何人能够帮助我,那真的很棒。 :)

更新:

添加信息:模式(a-z)中可以有任意数量的字符。两个字符不会代表相同的子字符串。此外,角色不能代表空字符串。

16 个答案:

答案 0 :(得分:18)

我能想到的最简单的解决方案是将给定的字符串分成四个部分并比较各个部分。您不知道ab有多长,但两个a的长度和b的长度相同。因此,划分给定字符串的方式数量不是很大。

实施例: pattern = [a b a b],给定string = redblueredblue(共14个字符)

  1. |a|a的长度)= 1,然后a为2个字符,b为12个字符,即{{1} } = 6.分割字符串= |b|。哇,马上匹配!
  2. (出于好奇)r edblue r edblue - > divide string = |a| = 2, |b| = 5 - >匹配
  3. 示例2: pattern = re dblue re dblue,string = [a b a b](共14个字符)

    1. redbluebluered - > divide string = |a| = 1, |b| = 6 - >没有比赛
    2. r edblue b luered - > divide string = |a| = 2, |b| = 5 - >没有比赛
    3. re dblue bl uered - > divide string = |a| = 3, |b| = 4 - >没有比赛
    4. 不需要检查其余内容,因为如果您为red blue blu ered切换a,反之亦然,则情况相同。

      具有[a b c a b c]的模式是什么?

答案 1 :(得分:8)

不要只需要使用反向引用将模式转换为正则表达式,即类似这样的内容(Python 3带有" re"模块已加载):

>>> print(re.match('(.+)(.+)\\2\\1', 'catdogdogcat'))
<_sre.SRE_Match object; span=(0, 12), match='catdogdogcat'>

>>> print(re.match('(.+)(.+)\\1\\2', 'redblueredblue'))
<_sre.SRE_Match object; span=(0, 14), match='redblueredblue'>

>>> print(re.match('(.+)(.+)\\2\\1', 'redblueredblue'))
None

正则表达式看起来非常简单。如果您需要支持超过9个backrefs,则可以使用命名组 - 请参阅Python regexp docs

答案 2 :(得分:2)

这是java回溯解决方案。 Source link

public class Solution {

public boolean isMatch(String str, String pat) {
Map<Character, String> map = new HashMap<>();
return isMatch(str, 0, pat, 0, map);
 }

boolean isMatch(String str, int i, String pat, int j, Map<Character,  String> map) {
// base case
if (i == str.length() && j == pat.length()) return true;
if (i == str.length() || j == pat.length()) return false;

// get current pattern character
char c = pat.charAt(j);

// if the pattern character exists
if (map.containsKey(c)) {
  String s = map.get(c);

  // then check if we can use it to match str[i...i+s.length()]
  if (i + s.length() > str.length() || !str.substring(i, i + s.length()).equals(s)) {
    return false;
  }

  // if it can match, great, continue to match the rest
  return isMatch(str, i + s.length(), pat, j + 1, map);
}

// pattern character does not exist in the map
for (int k = i; k < str.length(); k++) {
  // create or update the map
  map.put(c, str.substring(i, k + 1));

  // continue to match the rest
  if (isMatch(str, k + 1, pat, j + 1, map)) {
    return true;
  }
}

// we've tried our best but still no luck
map.remove(c);

return false;
 }

}

答案 3 :(得分:1)

另一个强力递归解决方案:

import java.io.IOException;
import java.util.*;

public class Test {

    public static void main(String[] args) throws IOException {
        int res;
        res = wordpattern("abba", "redbluebluered");
        System.out.println("RESULT: " + res);
    }

    static int wordpattern(String pattern, String input) {
        int patternSize = 1;
        boolean res = findPattern(pattern, input, new HashMap<Character, String>(), patternSize);
        while (!res && patternSize < input.length())
        {
            patternSize++;
            res = findPattern(pattern, input, new HashMap<Character, String>(), patternSize);
        }
        return res ? 1 : 0;
    }

    private static boolean findPattern(String pattern, String input, Map<Character, String> charToValue, int patternSize) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < pattern.length(); i++) {
            char c = pattern.charAt(i);
            if (charToValue.containsKey(c)) {
                sb.append(charToValue.get(c));
            } else {
                // new character in pattern
                if (sb.length() + patternSize > input.length()) {
                    return false;
                } else {
                    String substring = input.substring(sb.length(), sb.length() + patternSize);
                    charToValue.put(c, substring);
                    int newPatternSize = 1;
                    boolean res = findPattern(pattern, input, new HashMap<>(charToValue), newPatternSize);
                    while (!res && newPatternSize + sb.length() + substring.length() < input.length() - 1) {
                        newPatternSize++;
                        res = findPattern(pattern, input, new HashMap<>(charToValue), newPatternSize);
                    }
                    return res;
                }
            }
        }
        return sb.toString().equals(input) && allValuesUniq(charToValue.values());
    }

    private static boolean allValuesUniq(Collection<String> values) {
        Set<String> set = new HashSet<>();
        for (String v : values) {
            if (!set.add(v)) {
                return false;
            }
        }
        return true;
    }
}

答案 4 :(得分:1)

我在C#上的实现。试图用C#寻找干净的东西,无法找到。所以我将它添加到这里。

   private static bool CheckIfStringFollowOrder(string text, string subString)
    {
        int subStringLength = subString.Length;

        if (text.Length < subStringLength) return false;

        char x, y;
        int indexX, indexY;

        for (int i=0; i < subStringLength -1; i++)
        {
            indexX = -1;
            indexY = -1;

            x = subString[i];
            y = subString[i + 1];

            indexX = text.LastIndexOf(x);
            indexY = text.IndexOf(y);

            if (y < x || indexX == -1 || indexY == -1)
                return false;
        }

        return true;

    }

答案 5 :(得分:0)

@EricM

我测试了你的DFS解决方案,这似乎是错误的,例如:

pattern = [&#34; a&#34;,&#34; b&#34;,&#34; a&#34;],s =&#34; patrpatrr&#34;

问题是,当您遇到dict中已存在的模式并发现它不能适合以下字符串时,您将删除并尝试为其分配新值。但是,您无法使用之前发生的新值检查此模式。

我的想法是提供添加字典(或在此字典中合并)新值以跟踪它第一次出现,另一个堆栈跟踪我遇到的独特模式。什么时候&#34;不匹配&#34;发生,我会知道最后一个模式有一些问题,我从堆栈弹出它并修改dict中的相应值,我也将开始再次检查相应的索引。如果不能再修改了。我将弹出,直到堆栈中没有剩余,然后返回False。

(我想添加评论,但没有足够的声誉作为新用户..我还没有实现它,但直到现在我还没有发现我的逻辑错误。我是抱歉,如果我的解决方案有问题==我会稍后尝试实施。)

答案 6 :(得分:0)

我想不出比蛮力解决方案好多了:尝试每个可能的分区(这基本上是Jan所描述的)。

运行时复杂性为O(n^(2m)),其中m是模式的长度,n是字符串的长度。

这是代码的样子(我让我的代码返回实际的映射而不是0或1.修改代码返回0或1很容易):

import java.util.Arrays;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class StringBijection {
    public static void main(String[] args) {
        String chars = "abaac";
        String string = "johnjohnnyjohnjohncodes";
        List<String> stringBijection = getStringBijection(chars, string);

        System.out.println(Arrays.toString(stringBijection.toArray()));
    }

    public static List<String> getStringBijection(String chars, String string) {
        if (chars == null || string == null) {
            return null;
        }

        Map<Character, String> bijection = new HashMap<Character, String>();
        Deque<String> assignments = new ArrayDeque<String>();
        List<String> results = new ArrayList<String>();
        boolean hasBijection = getStringBijection(chars, string, 0, 0, bijection, assignments);

        if (!hasBijection) {
            return null;
        }

        for (String result : assignments) {
            results.add(result);
        }

        return results;
    }

    private static boolean getStringBijection(String chars, String string, int charIndex, int stringIndex, Map<Character, String> bijection, Deque<String> assignments) {
        int charsLen = chars.length();
        int stringLen = string.length();

        if (charIndex == charsLen && stringIndex == stringLen) {
            return true;
        } else if (charIndex == charsLen || stringIndex == stringLen) {
            return false;
        }

        char currentChar = chars.charAt(charIndex);
        List<String> possibleWords = new ArrayList<String>();
        boolean charAlreadyAssigned = bijection.containsKey(currentChar);

        if (charAlreadyAssigned) {
            String word = bijection.get(currentChar);
            possibleWords.add(word);
        } else {
            StringBuilder word = new StringBuilder();

            for (int i = stringIndex; i < stringLen; ++i) {
                word.append(string.charAt(i));
                possibleWords.add(word.toString());
            }
        }

        for (String word : possibleWords) {
            int wordLen = word.length();
            int endIndex = stringIndex + wordLen;

            if (endIndex <= stringLen && string.substring(stringIndex, endIndex).equals(word)) {
                if (!charAlreadyAssigned) {
                    bijection.put(currentChar, word);
                }

                assignments.addLast(word);

                boolean done = getStringBijection(chars, string, charIndex + 1, stringIndex + wordLen, bijection, assignments);

                if (done) {
                    return true;
                }

                assignments.removeLast();

                if (!charAlreadyAssigned) {
                    bijection.remove(currentChar);
                }
            }
        }

        return false;
    }
}

答案 7 :(得分:0)

如果您正在寻找C ++解决方案,这里有一个强力解决方案: https://linzhongzl.wordpress.com/2014/11/04/repeating-pattern-match/

答案 8 :(得分:0)

普通蛮力,不确定这里是否可以进行任何优化..

import java.util.HashMap;
import java.util.Map;
import org.junit.*;

public class Pattern {
   private Map<Character, String> map;
   private boolean matchInt(String pattern, String str) {
      if (pattern.length() == 0) {
         return str.length() == 0;
      }
      char pch = pattern.charAt(0);
      for (int i = 0; i < str.length(); ++i) {
         if (!map.containsKey(pch)) {
            String val = str.substring(0, i + 1);
            map.put(pch, val);
            if (matchInt(pattern.substring(1), str.substring(val.length()))) {
               return true;
            } else {
               map.remove(pch);
            }
         } else {
            String val = map.get(pch);
            if (!str.startsWith(val)) {
               return false;
            }
            return matchInt(pattern.substring(1), str.substring(val.length()));
         }
      }
      return false;
   }
   public boolean match(String pattern, String str) {
      map = new HashMap<Character, String>();
      return matchInt(pattern, str);
   }
   @Test
   public void test1() {
      Assert.assertTrue(match("aabb", "ABABCDCD"));
      Assert.assertTrue(match("abba", "redbluebluered"));
      Assert.assertTrue(match("abba", "asdasdasdasd"));
      Assert.assertFalse(match("aabb", "xyzabcxzyabc"));
      Assert.assertTrue(match("abba", "catdogdogcat"));
      Assert.assertTrue(match("abab", "ryry"));
      Assert.assertFalse(match("abba", " redblueredblue"));
   }
}

答案 9 :(得分:0)

class StringPattern{
public:
  int n, pn;
  string str;
  unordered_map<string, pair<string, int>> um;
  vector<string> p;
  bool match(string pat, string str_) {
    p.clear();
    istringstream istr(pat);
    string x;
    while(istr>>x) p.push_back(x);
    pn=p.size();
    str=str_;
    n=str.size();
    um.clear();
    return dfs(0, 0);
  }

  bool dfs(int i, int c) {
    if(i>=n) {
      if(c>=pn){
          return 1;
      }
    }
    if(c>=pn) return 0;
    for(int len=1; i+len-1<n; len++) {
      string sub=str.substr(i, len);


      if(um.count(p[c]) && um[p[c]].fi!=sub
         || um.count(sub) && um[sub].fi!=p[c]
         )
          continue;
      //cout<<"str:"<<endl;
      //cout<<p[c]<<" "<<sub<<endl;
      um[p[c]].fi=sub;
      um[p[c]].se++;
      um[sub].fi=p[c];
      um[sub].se++;
      //um[sub]=p[c];
      if(dfs(i+len, c+1)) return 1;
      um[p[c]].se--;
      if(!um[p[c]].se) um.erase(p[c]);
      um[sub].se--;
      if(!um[sub].se) um.erase(sub);
      //um.erase(sub);
    }
    return 0;
  }
};

我的解决方案,因为需要两个侧面的hashmap,并且还需要计算哈希映射计数

答案 10 :(得分:0)

我的java脚本解决方案:

function isMatch(pattern, str){

  var map = {}; //store the pairs of pattern and strings

  function checkMatch(pattern, str) {

    if (pattern.length == 0 && str.length == 0){
      return true;
    }
    //if the pattern or the string is empty
    if (pattern.length == 0 || str.length == 0){
      return false;
    }

    //store the next pattern
    var currentPattern = pattern.charAt(0);

    if (currentPattern in map){
        //the pattern has alredy seen, check if there is a match with the string
        if (str.length >= map[currentPattern].length  && str.startsWith(map[currentPattern])){
          //there is a match, try all other posibilities
          return checkMatch(pattern.substring(1), str.substring(map[currentPattern].length));
        } else {
          //no match, return false
          return false;
        }
    }

    //the current pattern is new, try all the posibilities of current string
    for (var i=1; i <= str.length; i++){
        var stringToCheck = str.substring(0, i);

        //store in the map
        map[currentPattern] = stringToCheck;
        //try the rest
        var match = checkMatch(pattern.substring(1), str.substring(i));
        if (match){
            //there is a match
             return true;
        } else {
           //if there is no match, delete the pair from the map
           delete map[currentPattern];
        }
    }
    return false;
  }

  return checkMatch(pattern, str);

}

答案 11 :(得分:0)

我使用regexen将其解决为语言生成问题。

def  wordpattern( pattern,  string):
    '''
        input: pattern 'abba'
        string  'redbluebluered'
        output: 1 for match, 2 for no match
    '''

    # assemble regex into something like this for 'abba':
    # '^(?P<A>.+)(?P<B>.+)(?P=B)(?P=A)$'
    p = pattern
    for c in pattern:
        C = c.upper()
        p = p.replace(c,"(?P<{0}>.+)".format(C),1)
        p = p.replace(c,"(?P={0})".format(C),len(pattern))
    p = '^' + p + '$'

    # check for a preliminary match
    if re.search(p,string):
        rem = re.match(p,string)
        seen = {}
        # check to ensure that no points in the pattern share the same match
        for c in pattern:
            s = rem.group(c.upper())
            # has match been seen? yes, fail, no continue
            if s in seen and seen[s] != c:
                return 0
            seen[s] = c
        # success
            return  1
    # did not hit the search, fail
    return 0

答案 12 :(得分:0)

pattern - &#34; abba&#34 ;; input - &#34; redbluebluered&#34;

  1. 查找pattern中每个唯一字符的计数,分配给列表pattern_count。例如:[2,2] ab
  2. 为每个唯一字符分配pattern_lengths。例:[1,1]
  3. 迭代pattern_lengths从右到左维持等式的值: pattern_count * (pattern_lengths)^T = length(input)(向量的点积)。使用步骤直接跳到下一个方程根。
  4. 当等式成立时,检查字符串是否跟随当前pattern_lenghtscheck_combination()
  5. 的模式

    Python实现:

    def check(pattern, input):
        def _unique(pattern):
            hmap = {}
            for i in pattern:
                if i not in hmap:
                    hmap[i] = 1
                else:
                    hmap[i] += 1
            return hmap.keys(), hmap.values()
        def check_combination(pattern, string, pattern_unique, pattern_lengths):
            pos = 0
            hmap = {}
            _set = set()
            for code in pattern:
                string_value = string[pos:pos + pattern_lengths[pattern_unique.index(code)]]
                if code in hmap:
                    if hmap[code] != string_value:
                        return False
                else:
                    if string_value in _set:
                        return False
                    _set.add(string_value)
                    hmap[code] = string_value
                pos += len(string_value)
            return False if pos < len(string) else True
    
        pattern = list(pattern)
        pattern_unique, pattern_count = _unique(pattern)
        pattern_lengths = [1] * len(pattern_unique)
        x_len =  len(pattern_unique)
        i = x_len - 1
        while i>0:
            diff_sum_pattern = len(input) - sum([x * y for x, y in zip(pattern_lengths, pattern_count)])
            if diff_sum_pattern >= 0:
                if diff_sum_pattern == 0 and \
                   check_combination(pattern, input, pattern_unique, pattern_lengths):
                        return 1
                pattern_lengths[i] += max(1, diff_sum_pattern // pattern_count[i])
            else:
                pattern_lengths[i:x_len] = [1] * (x_len - i)
                pattern_lengths[i - 1] += 1
                sum_pattern = sum([x * y for x, y in zip(pattern_lengths, pattern_count)])
                if sum_pattern <= len(input):
                    i = x_len - 1
                else:
                    i -= 1
                    continue
        return 0
    
    task = ("abcdddcbaaabcdddcbaa","redbluegreenyellowyellowyellowgreenblueredredredbluegreenyellowyellowyellowgreenblueredred")
    print(check(*task))
    

    在此代码中的示例模式(20个字符,4个唯一)比使用递归的普通强力(DFS)快50000倍(由@EricM实现);比正则表达式快30倍(由@IknoweD实现)。

答案 13 :(得分:0)

我编写的Java解决方案(基于此HackerRank Dropbox Challenge practice)。

您可以使用DEBUG_VARIATIONSDEBUG_MATCH标志来更好地了解算法的工作原理。

现在可能为时已晚,但您可能需要先阅读HackerRank的问题,然后再阅读建议的解决方案! ;-)

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Solution {

    private static final boolean DEBUG_VARIATIONS = false;
    private static final boolean DEBUG_MATCH = true;

    static int wordpattern(final String pattern, final String input) {
        if (pattern.length() == 1) {
            return 1;
        }

        final int nWords = pattern.length();

        final List<List<String>> lists = split(input, nWords);

        for (final List<String> words : lists) {
            if (DEBUG_VARIATIONS) {
                System.out.print("-> ");
                for (int i = 0; i < words.size(); i++) {
                    System.out.printf("%s ", words.get(i));
                }
                System.out.println();
            }

            if (matches(pattern, words)) {
                return 1;
            }
        }

        return 0;
    }

    // Return every possible way to split 'input' into 'n' parts
    private static final List<List<String>> split(final String input, final int n) {
        final List<List<String>> variations = new ArrayList<>();

        // Stop recursion when n == 2
        if (n == 2) {
            for (int i = 1; i < input.length(); i++) {
                final List<String> l = new ArrayList<>();
                l.add(input.substring(0, i));
                l.add(input.substring(i));
                variations.add(l);
            }
            return variations;
        }

        for (int i = 1; i < input.length() - n + 1; i++) {
            final List<List<String>> result = split(input.substring(i), n - 1);
            for (List<String> l : result) {
                l.add(0, input.substring(0, i));
            }
            variations.addAll(result);
        }

        return variations;
    }

    // Return 'true' if list of words matches patterns
    private static final boolean matches(final String pattern, final List<String> words) {
        final Map<String, String> patterns = new HashMap<>();

        for (int i = 0; i < pattern.length(); i++) {
            final String key = String.valueOf(pattern.charAt(i));
            final String value = words.get(i);

            boolean hasKey = patterns.containsKey(key);
            boolean hasValue = patterns.containsValue(value);

            if (!hasKey && !hasValue) {
                patterns.put(key, value);
            } else if (hasKey && !hasValue) {
                return false;
            } else if (!hasKey && hasValue) {
                return false;
            } else if (hasKey && hasValue) {
                if (!value.equals(patterns.get(key))) {
                    return false;
                }
            }
        }

        if (DEBUG_MATCH) {
            System.out.print("Found match! -> ");
            for (int i = 0; i < words.size(); i++) {
                System.out.printf("%s ", words.get(i));
            }
            System.out.println();
        }

        return true;
    }

    public static void main(final String[] args) {
        System.out.println(wordpattern("abba", "redbluebluered"));
    }
}

答案 14 :(得分:0)

递归检查每个组合。

#include <bits/stdc++.h>
using namespace std;

/**
 * Given a string and a pattern, check if the whole string is following the given pattern.
 * e.g.
 * string            pattern     return
 * redblueredblue     abab        a:red, b:blue  true
 * redbb               aba          false
 * 
 * Concept:
 * Recursively checking
 * point_pat:0 point_str:0 a:r point_pat:1 point_str:1 b:e/ed/edb...
 * point_pat:0 point_str:1 a:re point_pat:1 point_str:2 b:d/db/dbl...
 */

bool isMatch(const string &str, const string &pattern, unordered_map<char, string> &match_table, int point_str, int point_pat)
{
    if (point_pat >= pattern.size() && point_str >= str.size())
        return true;
    if (point_pat >= pattern.size() || point_str >= str.size())
        return false;

    if (match_table.count(pattern[point_pat]))
    {
        auto &match_str = match_table[pattern[point_pat]];
        if (str.substr(point_str, match_str.size()) == match_str)
            return isMatch(str, pattern, match_table, point_str + match_str.size(), point_pat + 1);
        else
            return false;
    }
    else
    {
        for (int len = 1; len <= str.size() - point_str; ++len)
        {
            match_table[pattern[point_pat]] = str.substr(point_str, len);
            if (isMatch(str, pattern, match_table, point_str + len, point_pat + 1))
            {
                return true;
            }
        }
        return false;
    }
}

bool isMatch(const string &str, const string &pattern)
{
    unordered_map<char, string> match_table;

    bool res = isMatch(str, pattern, match_table, 0, 0);

    for (const auto &p : match_table)
    {
        cout << p.first << " : " << p.second << "\n";
    }
    return res;
}

int main()
{
    string str{"redblueredblue"}, pattern{"abab"};
    cout << isMatch(str, pattern) << "\n";
    cout << isMatch(str, "ab") << "\n";
    cout << isMatch(str, "ababa") << "\n";
    cout << isMatch(str, "cba") << "\n";
    cout << isMatch(str, "abcabc") << "\n";
    cout << isMatch("patrpatrr", "aba") << "\n";
}

答案 15 :(得分:-1)

根据给出的模式,您可以回答“不同的”模式。问题(这确实是同一个问题)。

对于像[a b b a]这样的模式,确定该字符串是否是回文。

对于像[a b a b]这样的模式,确定字符串的后半部分是否等于字符串的前半部分。

更长的模式,如[a b c b c a],但你仍然将它分解为更小的问题来解决。对于这个,你知道字符串的最后n个字符应该是前n个字符的反转。一旦他们停止平等,你只需要检查另一个[b c b c]问题。

尽管可能,在一次采访中,我怀疑他们会给你提供比3-4个不同子串更复杂的东西。