Question

如何找到String 1的任何anagram是String 2的子字符串？

例如： -

字符串1 = rove

字符串2 = stackoverflow

所以它会像'＃34; rove＆＃34;是＆＃34; over＆＃34;这是String 2的子串

Answer 1

编辑：在最坏的情况下，我的第一个答案是二次方的。我把它调整为严格线性的：

这是一种基于滑动窗口概念的方法：创建一个由第一个字典的字母键入的字典，其中包含相应值的字母的频率计数。可以将其视为目标字典，需要在第二个字符串中用m个连续字母进行匹配，其中m是第一个字符串的长度。

首先处理第二个字符串中的第一个m个字母。对于每个此类字母，如果它在目标字典中显示为关键字减少，则相应的值为1.目标是将所有目标值驱动为0.将discrepancy定义为处理m个字母的第一个窗口后的值的绝对值。

反复执行以下操作：检查是否discrepancy == 0并返回True（如果有）。否则 - 取前面的字符m并检查它是否是目标键，如果是 - 将值增加1.在这种情况下，这会将差异增加或减少1，相应地进行调整。然后获取第二个字符串的下一个字符并处理它。检查它是否是字典中的键，如果是，则根据需要调整值和差异。

由于没有嵌套循环，每次通过主循环只涉及一些字典查找，比较，加法和减法，整体算法是线性的。

Python 3实现（显示窗口如何滑动以及目标计数和差异调整的基本逻辑）：

def subAnagram(s1,s2):
    m = len(s1)
    n = len(s2)
    if m > n: return false
    target = dict.fromkeys(s1,0)
    for c in s1: target[c] += 1

    #process initial window
    for i in range(m):
        c = s2[i]
        if c in target:
            target[c] -= 1
    discrepancy = sum(abs(target[c]) for c in target)

    #repeatedly check then slide:
    for i in range(m,n):
        if discrepancy == 0:
            return True
        else:
            #first process letter from m steps ago from s2
            c = s2[i-m]
            if c in target:
                target[c] += 1
                if target[c] > 0: #just made things worse
                    discrepancy +=1
                else:
                    discrepancy -=1
            #now process new letter:
            c = s2[i]
            if c in target:
                target[c] -= 1
                if target[c] < 0: #just made things worse
                    discrepancy += 1
                else:
                    discrepancy -=1
    #if you get to this stage:
    return discrepancy == 0

典型输出：

>>> subAnagram("rove", "stack overflow")
True
>>> subAnagram("rowe", "stack overflow")
False

为了对它进行压力测试，我从Project Gutenberg下载了Moby Dick的完整文本。这有超过100万个字符。＆＃34;台塑＆＃34;在书中提到，因此是＃34; moors＆＃34;看起来像Moby Dick的子串。但是，毫不奇怪，没有＃stack; stackoverflow＆＃34;出现在Moby Dick：

>>> f = open("moby dick.txt")
>>> md = f.read()
>>> f.close()
>>> len(md)
1235186
>>> subAnagram("moors",md)
True
>>> subAnagram("stackoverflow",md)
False

最后一次调用大约需要1秒钟来处理Moby Dick的完整文本，并验证没有＆＃34; stackoverflow＆＃34;出现在其中。

Answer 2

可以在O(n^3)预处理中完成，每个查询O(klogk)，其中：n是“给定字符串”的大小（示例中为字符串2）和{ {1}}是查询的大小（示例中为字符串1）。

预处理：

查询：

For each substring s of string2: //O(n^2) of those
    sort s 
    store s in some data base (hash table, for example)

这个答案假设您要为单个字符串（字符串2）检查多个“查询”（字符串1），从而尝试优化每个查询的复杂性。

正如评论中所讨论的，您可以懒惰地执行前处理步骤 - 这意味着，当您第一次遇到长度为given a query q: sort q check if q is in the data base if it is - it's an anagram of some substring otherwise - it is not.的查询时，会向DS插入所有长度为k的子字符串，然后继续作为原始建议。

Answer 3

设L是String1的长度。

循环遍历String2并检查长度为L的每个子字符串是否为String1的字谜。

在您的示例中，String1 = rove和String2 = stackoverflow。

<强> STAC koverflow

stac 和 rove 不是字谜，因此请移至长度为L的下一个子字符串。

取值的粘性溢出

大头钉和大道不是字谜，依此类推，直到找到子串。

更快的方法是检查当前子字符串中的最后一个字母是否存在于String1中，即，一旦发现stac和rove不是字谜，并且看到＆＃39; c＆＃39; （这是当前子字符串的最后一个字母）在rove中不存在，您可以完全跳过该子字符串并从＆＃39; k＆＃39;中获取下一个子字符串。

即。的 STAC koverflow

stac 和 rove 不是字谜。＆＃39; C＆＃39;在＆＃39; rove＆＃39;中不存在，因此只需跳过此子字符串并检查＆＃39; k＆＃39;：

STAC的科韦 rflow

这将显着减少比较次数。

修改

以上是上述方法的Python 2实现。

注意：此实现的工作原理是假设两个字符串中的所有字符都是小写的，并且它们只包含字符a -z。

def isAnagram(s1, s2): c1 = [0] * 26 c2 = [0] * 26 # increase character counts for each string for i in s1: c1[ord(i) - 97] += 1 for i in s2: c2[ord(i) - 97] += 1 # if the character counts are same, they are anagrams if c1 == c2: return True return False def isSubAnagram(s1, s2): l = len(s1) # s2[start:end] represents the substring in s2 start = 0 end = l while(end <= len(s2)): sub = s2[start:end] if isAnagram(s1, sub): return True elif sub[-1] not in s1: start += l end += l else: start += 1 end += 1 return False

输出：

>>> print isSubAnagram('rove', 'stackoverflow') True >>> print isSubAnagram('rowe', 'stackoverflow') False

Answer 4

你可能需要创建像rove，rvoe，reov这样漫游的String1的所有可能组合。然后检查这个组合是否在String2中。

Answer 5

//Two string are considered and check whether Anagram of the second     string is 
//present in the first string as part of it (Substring)
//e.g. 'atctv' 'cat' will return true as 'atc' is anagram of cat
//Similarly 'battex' is containing an anagram of 'text' as 'ttex'

public class SubstringIsAnagramOfSecondString {

    public static boolean isAnagram(String str1, String str2){
        //System.out.println(str1+"::" + str2);
        Character[] charArr = new Character[str1.length()];

        for(int i = 0; i < str1.length(); i++){
            char ithChar1 = str1.charAt(i);
            charArr[i] = ithChar1;
        }
        for(int i = 0; i < str2.length(); i++){
            char ithChar2 = str2.charAt(i);
            for(int j = 0; j<charArr.length; j++){
                if(charArr[j] == null) continue;
                if(charArr[j] == ithChar2){
                    charArr[j] = null;
                }
            }
        }
        for(int j = 0; j<charArr.length; j++){
            if(charArr[j] != null)
                return false;
        }
        return true;
    }

    public static boolean isSubStringAnagram(String firstStr, String secondStr){
        int secondLength =  secondStr.length();
        int firstLength =  firstStr.length();
        if(secondLength == 0) return true;
        if(firstLength < secondLength || firstLength == 0) return false;
        //System.out.println("firstLength:"+ firstLength +" secondLength:" + secondLength+ 
                //" firstLength - secondLength:" + (firstLength - secondLength));

        for(int i = 0; i < firstLength - secondLength +1; i++){
            if(isAnagram(firstStr.substring(i, i+secondLength),secondStr )){
                return true;
            }
        }
        return false;

    }
    public static void main(String[] args) {
        System.out.println("isSubStringAnagram(xyteabc,ate): "+ isSubStringAnagram("xyteabc","ate"));

    }

}

字符串2的字符串是字符串1的子字符串

5 个答案: