最长的公共子串,不带任何单词

时间:2019-04-11 05:13:44

标签: java algorithm longest-substring

我在编程上是一个新手,我正在尝试解决Java中最长的常见序列/子字符串问题之一。因此,我正在研究的算法问题是找到最长的公共子字符串而不切开单词。

例如:给定的string1 = He had 3 pennies and 5 quartersstring2 = q3nniesp应该返回pennies

其他示例:string1 = They named the new place face cafestring2 = e face,输出将为e face cafe

我试图弄清楚这个算法,但是我无法决定是否需要将它们转换为char数组或将其评估为字符串。两个字符串都可以有空格的方式使我感到困惑。

我遵循了一些现有的stackoverflow问题,并尝试从https://www.geeksforgeeks.org/修改此代码:

static String findLongestSubsequence(String str1, String str2) {

        char[] A = str1.toCharArray();
        char[] B = str2.toCharArray();
        if (A == null || B == null) return null;

        final int n = A.length;
        final int m = B.length;

        if (n == 0 || m == 0) return null;

        int[][] dp = new int[n+1][m+1];

        // Suppose A = a1a2..an-1an and B = b1b2..bn-1bn
        for (int i = 1; i <= n; i++ ) {
            for (int j = 1; j <= m; j++) {

                // If ends match the LCS(a1a2..an-1an, b1b2..bn-1bn) = LCS(a1a2..an-1, b1b2..bn-1) + 1
                if (A[i-1] == B[j-1]) dp[i][j] = dp[i-1][j-1] + 1;

                    // If the ends do not match the LCS of a1a2..an-1an and b1b2..bn-1bn is
                    // max( LCS(a1a2..an-1, b1b2..bn-1bn), LCS(a1a2..an-1an, b1b2..bn-1) )
                else dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);

            }
        }

        int lcsLen = dp[n][m];
        char[] lcs = new char[lcsLen];
        int index = 0;

        // Backtrack to find a LCS. We search for the cells
        // where we included an element which are those with
        // dp[i][j] != dp[i-1][j] and dp[i][j] != dp[i][j-1])
        int i = n, j = m;
        while (i >= 1 && j >= 1) {

            int v = dp[i][j];

            // The order of these may output different LCSs
            while(i > 1 && dp[i-1][j] == v) i--;
            while(j > 1 && dp[i][j-1] == v) j--;

            // Make sure there is a match before adding
            if (v > 0) lcs[lcsLen - index++ - 1] = A[i-1]; // or B[j-1];

            i--; j--;

        }

        return new String(lcs, 0, lcsLen);
    }

但是我一直得到错误的输出。例如,第一个输出给出了output = 3nnies,我真的被困在这一点上,任何人都可以伸出援手或稍作尝试吗?谢谢大家。

1 个答案:

答案 0 :(得分:1)

不幸的是,我尝试了您的原始算法,但方向不正确。

我假设以下准则适用:

  • 匹配的子字符串包含给定子字符串中的字符,这些字符可能不正确。
  • 给定子字符串中的字符可能在匹配子字符串中出现多次。

因此,我在使用Java流时自由使用了蛮力算法:

// complexity of O(N*M), where N is the length of the original string and M is the length of the substring
static String longestCommonSequence(String string, String sub) {
    List<Character> primaryMatch = new ArrayList<>();
    List<Character> secondaryMatch = new ArrayList<>();
    // N iterations loop on original string
    for(char c : string.toCharArray()) {
      // M iterations loop on the substring
      if(sub.indexOf(c) != -1) {
        primaryMatch.add(c);
      }
      else {
        if(!primaryMatch.isEmpty()) {
          // replace secondaryMatch content with the current longet match
          if(secondaryMatch.size() <= primaryMatch.size()) {
            secondaryMatch.clear();
            secondaryMatch.addAll(primaryMatch);
          }
          primaryMatch.clear();
        }
      }
    }
    if(primaryMatch.size() < secondaryMatch.size()) {
      return secondaryMatch.stream().map(String::valueOf).collect(Collectors.joining());
    }
    return primaryMatch.stream().map(String::valueOf).collect(Collectors.joining());
}

您提供的输入的输出为:

string1 = He had 3 pennies and 5 quarters and string2 = q3nniesp ---> pennies
string1 = They named the new place face cafe and string2 = e face ---> ace face cafe 

请注意第二个输出的区别-根据您描述的输出行为,上述算法的结果是正确的,因为ace face cafee face cafe长,因为多次使用字符允许来自给定子字符串的内容。

请注意算法中的一个细微问题: if(secondaryMatch.size() <= primaryMatch.size())

在多个相同长度的匹配子字符串的情况下,当前实现将返回最后一个匹配项(基于原始字符串中的字符顺序)。如果您希望返回第一个匹配项,请将<=替换为<

如果不允许我描述的假设-请对此答案发表评论,我会根据您的说明进行更新。