查找字符串中最长的重复子字符串所需的Java函数?

时间:2010-12-15 18:29:17

标签: java algorithm

需要java函数来查找字符串中最长的重复子字符串

For instance, if the input is “banana”,output should be "ana" and we have count the number of times it has appeared in this case it is 2.

解决方案如下

公共课测试{
   public static void main(String [] args){
       System.out.println(findLongestSubstring(“我喜欢ike”));
       System.out.println(findLongestSubstring(“女士我是亚当”));
       System.out.println(findLongestSubstring(“当生活给你柠檬水时,制作柠檬”));
       的System.out.println(findLongestSubstring( “香蕉”));
    }

public static String findLongestSubstring(String value) {
    String[] strings = new String[value.length()];
    String longestSub = "";

    //strip off a character, add new string to array
    for(int i = 0; i < value.length(); i++){
        strings[i] = new String(value.substring(i));
    }

    //debug/visualization
    //before sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Arrays.sort(strings);
    System.out.println();

    //debug/visualization
    //after sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Vector<String> possibles = new Vector<String>();
    String temp = "";
    int curLength = 0, longestSoFar = 0;

    /*
     * now that the array is sorted compare the letters
     * of the current index to those above, continue until 
     * you no longer have a match, check length and add
     * it to the vector of possibilities
     */ 
    for(int i = 1; i < strings.length; i++){
        for(int j = 0; j < strings[i-1].length(); j++){
            if (strings[i-1].charAt(j) != strings[i].charAt(j)){
                break;
            }
            else{
                temp += strings[i-1].charAt(j);
                curLength++;
            }
        }
        //this could alleviate the need for a vector
        //since only the first and subsequent longest 
        //would be added; vector kept for simplicity
        if (curLength >= longestSoFar){
            longestSoFar = curLength;
            possibles.add(temp);
        }
        temp = "";
        curLength = 0;
    }

    System.out.println("Longest string length from possibles: " + longestSoFar);

    //iterate through the vector to find the longest one
    int max = 0;
    for(int i = 0; i < possibles.size();i++){
        //debug/visualization
        System.out.println(possibles.elementAt(i));
        if (possibles.elementAt(i).length() > max){ 
            max = possibles.elementAt(i).length();
            longestSub = possibles.elementAt(i);
        }
    }
    System.out.println();
    //concerned with whitespace up until this point
    // "lemon" not " lemon" for example
    return longestSub.trim(); 
}

}

2 个答案:

答案 0 :(得分:4)

This is a common CS problem with a dynamic programming solution.

编辑(对于李杰):

您在技术上是正确的 - 这不是完全相同的问题。然而,这并不会使上面的链接无关紧要,并且如果提供的两个字符串相同,则可以使用相同的方法(特别是w.r.t.动态编程) - 只需要进行一次修改:不要考虑沿对角线的情况。或者,正如其他人指出的那样(例如LaGrandMere),请使用后缀树(也可在上面的链接中找到)。

编辑(适用于Deepak):

A Java implementation of the Longest Common Substring (using dynamic programming) can be found here。请注意,您需要修改它以忽略“对角线”(查看维基百科图)或最长的常见字符串本身!

答案 1 :(得分:1)

在Java中:Suffix Tree

感谢那些已经找到解决办法的人,我不知道。