Question

我在编程上是一个新手，我正在尝试解决Java中最长的常见序列/子字符串问题之一。因此，我正在研究的算法问题是找到最长的公共子字符串而不切开单词。

例如：给定的string1 = He had 3 pennies and 5 quarters和string2 = q3nniesp应该返回pennies。

其他示例：string1 = They named the new place face cafe和string2 = e face，输出将为e face cafe。

我试图弄清楚这个算法，但是我无法决定是否需要将它们转换为char数组或将其评估为字符串。两个字符串都可以有空格的方式使我感到困惑。

我遵循了一些现有的stackoverflow问题，并尝试从https://www.geeksforgeeks.org/修改此代码：

static String findLongestSubsequence(String str1, String str2) {

        char[] A = str1.toCharArray();
        char[] B = str2.toCharArray();
        if (A == null || B == null) return null;

        final int n = A.length;
        final int m = B.length;

        if (n == 0 || m == 0) return null;

        int[][] dp = new int[n+1][m+1];

        // Suppose A = a1a2..an-1an and B = b1b2..bn-1bn
        for (int i = 1; i <= n; i++ ) {
            for (int j = 1; j <= m; j++) {

                // If ends match the LCS(a1a2..an-1an, b1b2..bn-1bn) = LCS(a1a2..an-1, b1b2..bn-1) + 1
                if (A[i-1] == B[j-1]) dp[i][j] = dp[i-1][j-1] + 1;

                    // If the ends do not match the LCS of a1a2..an-1an and b1b2..bn-1bn is
                    // max( LCS(a1a2..an-1, b1b2..bn-1bn), LCS(a1a2..an-1an, b1b2..bn-1) )
                else dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);

            }
        }

        int lcsLen = dp[n][m];
        char[] lcs = new char[lcsLen];
        int index = 0;

        // Backtrack to find a LCS. We search for the cells
        // where we included an element which are those with
        // dp[i][j] != dp[i-1][j] and dp[i][j] != dp[i][j-1])
        int i = n, j = m;
        while (i >= 1 && j >= 1) {

            int v = dp[i][j];

            // The order of these may output different LCSs
            while(i > 1 && dp[i-1][j] == v) i--;
            while(j > 1 && dp[i][j-1] == v) j--;

            // Make sure there is a match before adding
            if (v > 0) lcs[lcsLen - index++ - 1] = A[i-1]; // or B[j-1];

            i--; j--;

        }

        return new String(lcs, 0, lcsLen);
    }

但是我一直得到错误的输出。例如，第一个输出给出了output = 3nnies，我真的被困在这一点上，任何人都可以伸出援手或稍作尝试吗？谢谢大家。

Answer 1

不幸的是，我尝试了您的原始算法，但方向不正确。

我假设以下准则适用：

匹配的子字符串包含给定子字符串中的字符，这些字符可能不正确。
给定子字符串中的字符可能在匹配子字符串中出现多次。

因此，我在使用Java流时自由使用了蛮力算法：

// complexity of O(N*M), where N is the length of the original string and M is the length of the substring
static String longestCommonSequence(String string, String sub) {
    List<Character> primaryMatch = new ArrayList<>();
    List<Character> secondaryMatch = new ArrayList<>();
    // N iterations loop on original string
    for(char c : string.toCharArray()) {
      // M iterations loop on the substring
      if(sub.indexOf(c) != -1) {
        primaryMatch.add(c);
      }
      else {
        if(!primaryMatch.isEmpty()) {
          // replace secondaryMatch content with the current longet match
          if(secondaryMatch.size() <= primaryMatch.size()) {
            secondaryMatch.clear();
            secondaryMatch.addAll(primaryMatch);
          }
          primaryMatch.clear();
        }
      }
    }
    if(primaryMatch.size() < secondaryMatch.size()) {
      return secondaryMatch.stream().map(String::valueOf).collect(Collectors.joining());
    }
    return primaryMatch.stream().map(String::valueOf).collect(Collectors.joining());
}

您提供的输入的输出为：

string1 = He had 3 pennies and 5 quarters and string2 = q3nniesp ---> pennies
string1 = They named the new place face cafe and string2 = e face ---> ace face cafe

请注意第二个输出的区别-根据您描述的输出行为，上述算法的结果是正确的，因为ace face cafe比e face cafe长，因为多次使用字符允许来自给定子字符串的内容。

请注意算法中的一个细微问题： if(secondaryMatch.size() <= primaryMatch.size())

在多个相同长度的匹配子字符串的情况下，当前实现将返回最后一个匹配项（基于原始字符串中的字符顺序）。如果您希望返回第一个匹配项，请将<=替换为<。

如果不允许我描述的假设-请对此答案发表评论，我会根据您的说明进行更新。

最长的公共子串，不带任何单词

1 个答案: