Question

java.lang.String JavaDoc对默认的indexOf(String)子字符串搜索算法一无所知。所以我的问题是 - 不同的JRE使用哪些子串算法？

Answer 1

JDK中有src.zip，它显示了实现：

/**
 * Code shared by String and StringBuffer to do searches. The
 * source is the character array being searched, and the target
 * is the string being searched for.
 *
 * @param   source       the characters being searched.
 * @param   sourceOffset offset of the source string.
 * @param   sourceCount  count of the source string.
 * @param   target       the characters being searched for.
 * @param   targetOffset offset of the target string.
 * @param   targetCount  count of the target string.
 * @param   fromIndex    the index to begin searching from.
 */
static int indexOf(char[] source, int sourceOffset, int sourceCount,
                   char[] target, int targetOffset, int targetCount,
                   int fromIndex) {
if (fromIndex >= sourceCount) {
        return (targetCount == 0 ? sourceCount : -1);
}
    if (fromIndex < 0) {
        fromIndex = 0;
    }
if (targetCount == 0) {
    return fromIndex;
}

    char first  = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* Look for first character. */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j] ==
                     target[k]; j++, k++);

            if (j == end) {
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}

Answer 2

fwiw（如果这个Q是关于不同算法的性能）在适当的硬件上和最近的oracle jvm（6u21以及后面详见bug report）中，String.indexOf是通过相关的SSE 4.2内在函数..参见本intel reference doc

中的第2.3章

Answer 3

以下是现在发现的内容：

Oracle JDK 1.6 / 1.7，OpenJDK 6/7

static int indexOf(char[] source, int sourceOffset, int sourceCount,
                   char[] target, int targetOffset, int targetCount,
                   int fromIndex) {
if (fromIndex >= sourceCount) {
        return (targetCount == 0 ? sourceCount : -1);
}
    if (fromIndex < 0) {
        fromIndex = 0;
    }
if (targetCount == 0) {
    return fromIndex;
}

    char first  = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* Look for first character. */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j] ==
                     target[k]; j++, k++);

            if (j == end) {
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}

IBM JDK 5.0

public int indexOf(String subString, int start) {
    if (start < 0) start = 0;
    int subCount = subString.count;
    if (subCount > 0) {
        if (subCount + start > count) return -1;
        char[] target = subString.value;
        int subOffset = subString.offset;
        char firstChar = target[subOffset];
        int end = subOffset + subCount;
        while (true) {
            int i = indexOf(firstChar, start);
            if (i == -1 || subCount + i > count) return -1; // handles subCount > count || start >= count
            int o1 = offset + i, o2 = subOffset;
            while (++o2 < end && value[++o1] == target[o2]);
            if (o2 == end) return i;
            start = i + 1;
        }
    } else return start < count ? start : count;
}

Sabre SDK

  public int indexOf(String str, int fromIndex)
  {
    if (fromIndex < 0)
      fromIndex = 0;
    int limit = count - str.count;
    for ( ; fromIndex <= limit; fromIndex++)
      if (regionMatches(fromIndex, str, 0, str.count))
        return fromIndex;
    return -1;
  }

随时更新此帖子。

Answer 4

由于大多数时候indexOf用于合理的小字符串中的小子串，我认为除了假设使用像Victor所示的相当简单的算法之外。有更多高级算法可以更好地处理大型字符串，但AFAIK这些对于相对较短的字符串表现更差。

不同的JRE使用什么子串搜索算法？

4 个答案:

Oracle JDK 1.6 / 1.7，OpenJDK 6/7

IBM JDK 5.0

Sabre SDK