Question

这个问题与我在工作中遇到的当前问题有关，但是由于它涉及面很广，因此我试图将其更多地作为面试问题，以鼓励讨论。

假设我们有以下两个字符串：

str1 = 'helloworld'
str2 = 'helloldwor'

我希望能够比较str1和str2，并使用str1是“正确”的假设来确定str2中的哪些字符乱序。还可以假设str2中的所有字符都与str1相同（str2只是str1的混杂版本）。

编辑：在这种情况下，我会说“ ld”是乱序的。我将“乱序”子字符串定义为str2的最小子字符串，如果将其移至与str1中该子字符串相同的位置，则会使str1 == str2。

这个问题困扰了我很长一段时间，因为它很容易从视觉上弄清楚，但是我正在努力将其转变为某种算法。

我的尝试

def get_ooo(str1, str2):
#for potential options
local_set = Set()

#Loop from len(str1) to 1, splitting str2 by i to cover all possible substrings of str2
split_size = len(str1)
for i in range(len(str1),1,-1):
    print 'Iteration #' + str(len(str1) - split_size)

    #Try to find all substrings of str2 of length 'i' in str1
    for j in range(0,len(str1)-i):
        if str1.find(mid(str2,j,i)) < 0:
            #Failed to find substring in str1

            #Add to our local_set if it is a substring of all other failed substrings
            intersect = True
            for k in local_set:
                if k.find(mid(str2,j,i)) < 0:
                    intersect = False

            #If substring was a substring of all other failed substrings
            if intersect:
                #Add to local_set
                local_set.add(mid(str2,j,i))
                print mid(str2,j,i) + ' - FAIL, PASS'
            else:
                print mid(str2,j,i) + ' - FAIL, FAIL'
        else:
            print mid(str2,j,i) + ' - PASS'

#solution found?
best_option = ''
for option in local_set:
    if len(option) < len(best_option) or best_option == '':
        best_option = option
return best_option

本质上，我正在使用的逻辑是从最大可能的子字符串开始，在str1中寻找str2的子字符串。当我找到不合适的解决方案时，便将其添加到可能的解决方案集中。而且，如果我发现另一个不适合str1的子字符串，则仅在它也是所有其他潜在选项的子字符串时，才将其添加到可能的选项中。因此，最后一组中最小的子字符串将包含第一个乱序字符。

因此，使用此算法，我总是知道乱序部分从何处开始。。但是，我对于如何真正提取乱序的部分一无所知。

我尝试将字符串反向传递给函数，这使我从背面的字符串中获得了字符的第一个实例，然后又给了我完整的乱序子字符串。但是，如果有多个部分乱码怎么办？此外，根据我的测试，此脚本仅返回str2中子字符串出现故障的第一个实例。例如：

str1 = 'helloworld'
str2 = 'hworldello'

将返回'hw'，告诉我'w'是字符串混乱的地方。但是在此示例中，如果'ello'出现故障，而不是'world'子串出现故障，则它会更有意义。

我已经盯着这个问题思考了一天多，并决定抽出时间接受其他意见，特别是因为我觉得必须有更好的方法。那你们怎么想呢？有什么好主意吗？

Answer 1

您可以利用递归并向前和向后搜索字符串，以确保返回最小的子字符串：

def find_substring_forwards(str1, str2, storage, index):

    global main1, main2

    for i, (x, y) in enumerate(zip(str1, str2)):
        if x!=y and not storage: index = i
        expected_letter = main1[main1.index(''.join(storage))+len(storage)]
        if (x!=y or (expected_letter==x and main1.count(expected_letter)>1)) and index==i:
            storage.append(y)
            str2 = str2[:i] + str2[i+1:]
            if str1[:len(str2)]==str2: break
            return find_substring_forwards(str1, str2, storage, index)

    return ''.join(storage)

def find_substring_backwards(str1, str2, storage, index):

    global main1, main2

    for i, (x, y) in enumerate(zip(str1, str2)):
        if x!=y and not storage: index = i
        if x!=y and index==i:
            storage.append(y)
            str2 = str2[:i] + str2[i+1:]
            if str1[:len(str2)]==str2: break
            return find_substring_backwards(str1, str2, storage, index)

    return ''.join(storage)

def out_of_order(str1, str2):

    x = ''.join(find_substring_forwards(str1, str2, [], None))
    y = ''.join(find_substring_backwards(str1[::-1], str2[::-1], [], None)[::-1])
    final = x if len(x)<=len(y) else y

    return final

一些测试用例：

test_cases = [('helloworld','heworldllo','llo'),
            ('helloworld','hwoellorld','wo'),
            ('helloworld','hworldello','ello'),
            ('helloworld','helloldwor','ld'),
            ('helloworld','helloowrld','o'),
            ('helloworld','whelloorld','w')]

for test in test_cases:
    main1 = test[0]; main2 = test[1]
    x = out_of_order(main1, main2)
    print(main1, '|', main2)
    print('Expected:', test[2], '| Returned:', x, '\n')

收益：

helloworld | heworldllo
Expected: llo | Returned: llo 

helloworld | hwoellorld
Expected: wo | Returned: wo 

helloworld | hworldello
Expected: ello | Returned: ello 

helloworld | helloldwor
Expected: ld | Returned: ld 

helloworld | helloowrld
Expected: o | Returned: o 

helloworld | whelloorld
Expected: w | Returned: w

说明：

我们知道str1是所需的字符串，同时遍历两个字符串。当我们从str1找到与正确位置不匹配的字母时，我们将该字母添加到存储中并记下索引。然后，我们从字符串中删除该字母并重复该过程。我们继续此递归循环，直到要删除字母的索引发生变化。这表明我们已经到达“乱序”子字符串的末尾。为了确保找到最小的子字符串（就字符而言），我们必须反向执行相同的方法（在字符串中向后迭代）。在out_of_order函数中，我们简单地采用两个子字符串中的较小者，如果它们相等，则采用从迭代向前遍历字符串的解决方案（因为从技术上讲两者都是正确的）。

更新

对于以下测试用例，在字符串中出现重复字母的上一个问题：

('helloworld','heworldllo','llo')

算法现在检查当前字母是否适合从字符串中删除，该字母实际上是否为乱序子字符串的预期字母。如果是这样，则将其添加到乱序的子字符串storage容器中，而不是过早地结束子字符串搜索。还分离了向前和向后迭代的功能，以提高清晰度和可读性。

如何在字符串中查找乱序字符

1 个答案: