最长重复字节序列

时间:2018-09-13 14:04:14

标签: java byte

我有应该寻找最长重复序列的代码。但按此顺序

  

7888885466662716666

,它在元素1-5中输出索引1-5中的第一个匹配项,在2-6中输出第二个元素。但是必须输出6,因为它们是重复的。我想沿着这条路径遍历该算法的顺序

  • 检查是否在整个字符串中重复第一个字符,如果 不是

  • 检查两个初始字符是否全部重复

  • 检查3 ...

但是我不知道如何将其纳入我的代码中,你能告诉我吗?

    private int element;
    private int lastElement;
    private int length;

    private byte[] readByteFromFile(File name) throws IOException {
        return Files.readAllBytes(name.toPath());
    }

    private void searchByte(byte[] byteMass) throws InterruptedException {
        for (int i = 0; i < byteMass.length; i++) {
                int count = 0;
                for (int j = i + 1; j < byteMass.length; j++) {
                    if (byteMass[i + count] == byteMass[j]) {
                        if (count >= length) {
                            length = count + 1;
                            element = i;
                            lastElement = j - count;
                        }
                        count++;
                    } else {
                        count = 0;
                    }
                }
        }
    }

2 个答案:

答案 0 :(得分:2)

老实说,我对此解决方案并不感到骄傲。在其他一些编程语言中,我相当熟练的我可以很容易地获得该解决方案(here is a possible implementation in 05AB1E for example),但是在Java中很难做到这一点。

通过将输入byte[]转换为String并检查其子字符串,我已经找到了解决方案。但是从性能角度来看,这是胡扯,所以我建议继续寻找其他方法来实现此目的。

无论如何,我的代码都可以正常工作,因此无论如何我都会将其发布,以防其中的一部分有用或鼓舞人心:

class Main{
  public static void main(String[] args){
    Main m = new Main();
    m.test("7888885466662716666".getBytes());
  }

  private void test(byte[] input){
    String result = findLongestRepeatedSubsequence("7888885466662716666".getBytes());
    System.out.println("The longest repeating subsequence in " + new String(input) + " is: " + result);
  }

  private String findLongestRepeatedSubsequence(byte[] byteMass){
    // Convert the bytes to a String:
    String bytesAsString = new String(byteMass);
    // Loop as long as this String has at least 1 character left:
    while(bytesAsString.length() > 0){
      // Split the String into characters, where each character is a loose String of length 1
      String[] charsAsStringArray = bytesAsString.split("");
      int length = charsAsStringArray.length;
      int maxCount = 0;
      int startingIndex = 0;
      // Loop `i` in the range [0, length_of_String_array)
      for(int i = 0; i < length; i++){
        // Take the substring where the first `i` characters are removed
        String subString = bytesAsString.substring(i);
        String currentChar = charsAsStringArray[i];
        // Count the amount of subsequent times the current character occurs at the start of the substring
        int count = subString.length() - subString.replaceFirst(currentChar+"*", "").length();
        // If this count is larger than our current maxCount:
        if(count > maxCount){
          // Replace the maxCount with this count
          maxCount = count;
          // And set the index where we've found this longest subsequence (`i`) as well
          startingIndex = i;
        }
      }
      // After we've checked all substrings, get the longest subsequence we've found
      String longestSub = bytesAsString.substring(startingIndex, startingIndex + maxCount);
      // Split the entire String with this longest subsequence to get its occurrence-count
      int occurrenceCounter = bytesAsString.split(longestSub, -1).length - 1;
      // If we've found a subsequence that occurs at least twice:
      if(occurrenceCounter > 1){
        // Return it as result
        return longestSub;
      }
      // If this longest subsequence only occurs once:
      else{
        // Remove the first character of this found subsequence from the String
        bytesAsString = bytesAsString.substring(0, startingIndex) +
                        (startingIndex < length-1 ? 
                           bytesAsString.substring(startingIndex + 1)
                         :
                           "");
      }
    }
    // Mandatory return if the input is empty
    return null;
  }
}

Try it online.(有用:与上面的代码相比,包含一些其他打印行。)

答案 1 :(得分:2)

这是我昨天写的被黑的解决方案...

基本上,它检查是否input.charAt(i) == input.charAt(i + 1),如果是,则运行第二个循环直到它们不匹配为止,并始终附加到String,然后添加到List。并重复。

然后检查List发生率最高(从here被无耻地偷走)

public static void addToList(String input) {
    String temp;
    List<String> l = new ArrayList<>();
    for (int i = 0; i < input.length() - 1; i++) {
        if (input.charAt(i) == input.charAt(i + 1)) {
            temp = String.valueOf(input.charAt(i));
            for (int j = i; j < input.length() - 1; j++) {
                if (input.charAt(j) == input.charAt(j + 1)) {
                    temp += String.valueOf(input.charAt(j + 1));
                    if (j == input.length() - 2) {
                        i = j;
                        if (!temp.isEmpty()) {
                            l.add(temp);
                        }
                        break;
                    }
                } else {
                    i = j - 1;
                    if (!temp.isEmpty()) {
                        l.add(temp);
                    }
                    break;
                }
            }
        }
    }
    System.out.println(getHighestOccurences(l));
}

public static String getHighestOccurences(List<String> list) {
    int max = 0;
    int curr;
    String currKey = null;
    Set<String> unique = new HashSet<>(list);
    for (String key : unique) {
        curr = Collections.frequency(list, key);
        if (max < curr) {
            max = curr;
            currKey = key;
        }
    }
    return currKey;
}

输入为String input = "7888885466662716666";并调用addToList(input);时,输出为:

  

6666

Online Demo