Question

给定一个字符串，找到其中的第一个非重复字符并返回其索引。如果它不存在，则返回-1。您可以假设该字符串仅包含小写字母。

我将定义一个跟踪字符出现的哈希。从左到右遍历字符串，检查当前字符是否在哈希中，如果是，则继续，否则在另一个循环中遍历字符串的其余部分以查看当前字符是否存在。如果不是，则返回索引并更新哈希（如果存在）。

def firstUniqChar(s):

    track = {}
    for index, i in enumerate(s):
        if i in track:
            continue
        elif i in s[index+1:]: # For the last element, i in [] holds False
            track[i] = 1
            continue
        else:
            return index
    return -1

firstUniqChar('timecomplexity')

我的算法的时间复杂度（平均值和最差值）是多少？

Answer 1

您的算法的时间复杂度为O(kn)，其中k是字符串中唯一字符的数量。如果k是常量，那么它是O(n)。由于问题描述明确界定了元素的替代数量（“假设小写（ASCII）字母”），因此k是常量，并且您的算法在O(n)时间内运行此问题。即使n将增长到无限，您也只会对字符串进行O(1)个切片，您的算法将保持O(n)。如果您删除track，那么它将是O(n²)：

In [36]: s = 'abcdefghijklmnopqrstuvwxyz' * 10000

In [37]: %timeit firstUniqChar(s)
100 loops, best of 3: 18.2 ms per loop

In [38]: s = 'abcdefghijklmnopqrstuvwxyz' * 20000

In [37]: %timeit firstUniqChar(s)
10 loops, best of 3: 36.3 ms per loop

In [38]: s = 'timecomplexity' * 40000 + 'a'

In [39]: %timeit firstUniqChar(s)
10 loops, best of 3: 73.3 ms per loop

它几乎认为T(n)仍然具有O(n)复杂度 - 它与字符串中的字符数精确地线性缩放，即使这是您的算法的最坏情况 - 没有一个独特的角色。

我将在这里展示一种效率不高，但又简单明智的方法;首先用collections.Counter计算字符直方图;然后迭代找到一个

的字符

from collections import Counter
def first_uniq_char_ultra_smart(s):
    counts = Counter(s)
    for i, c in enumerate(s):
        if counts[c] == 1:
            return i

    return -1

first_uniq_char('timecomplexity')

时间复杂度为O(n); Counter计算O(n)时间内的直方图，我们需要再次为O(n)个字符枚举字符串。但实际上我认为我的算法具有较低的常量，因为它使用Counter的标准字典。

让我们制作一个非常愚蠢的暴力算法。由于您可以假设该字符串仅包含小写字母，因此请使用该假设：

import string
def first_uniq_char_very_stupid(s):
    indexes = []
    for c in string.ascii_lowercase:
        if s.count(c) == 1:
            indexes.append(s.find(c))

    # default=-1 is Python 3 only
    return min(indexes, default=-1)

让我们在Python 3.5上测试我的算法和其他答案中的一些算法。我选择了一个在病理上对我的算法不利的案例：

In [30]: s = 'timecomplexity' * 10000 + 'a'

In [31]: %timeit first_uniq_char_ultra_smart(s)
10 loops, best of 3: 35 ms per loop

In [32]: %timeit karin(s)
100 loops, best of 3: 11.7 ms per loop

In [33]: %timeit john(s)
100 loops, best of 3: 9.92 ms per loop

In [34]: %timeit nicholas(s)
100 loops, best of 3: 10.4 ms per loop

In [35]: %timeit first_uniq_char_very_stupid(s)
1000 loops, best of 3: 1.55 ms per loop

所以，我的愚蠢算法是最快的，因为它在最后找到了a并且挽救了。我的智能算法速度最慢，除了最糟糕的情况之外，我的算法性能不佳的另一个原因是OrderedDict在Python 3.5上用C编写，而Counter在Python中。

让我们在这里做一个更好的测试：

In [60]: s = string.ascii_lowercase * 10000

In [61]: %timeit nicholas(s)
100 loops, best of 3: 18.3 ms per loop

In [62]: %timeit karin(s)
100 loops, best of 3: 19.6 ms per loop

In [63]: %timeit john(s)
100 loops, best of 3: 18.2 ms per loop

In [64]: %timeit first_uniq_char_very_stupid(s)
100 loops, best of 3: 2.89 ms per loop

所以看起来我的“愚蠢”算法并不是那么愚蠢，它利用了C的速度，同时最大限度地减少了正在运行的Python代码的迭代次数，并在这个问题中获得了明显的胜利。

Answer 2

~~正如其他人所说，由于嵌套的线性搜索，你的算法是O(n²)。~~正如@Antti发现的那样，OP的算法是线性的并且由{{1 } {} O(kn)作为所有可能的小写字母的数量。

我对k解决方案的提议：

O(n)

Answer 3

您的算法是O（n ²），因为您在s上的循环内的s切片上进行了“隐藏”迭代。

更快的算法是：

def first_unique_character(s):
    good = {} # char:idx
    bad = set() # char
    for index, ch in enumerate(s):
        if ch in bad:
            continue
        if ch in good: # new repeat
            bad.add(ch)
            del good[ch]
        else:
            good[ch] = index

    if not good:
        return -1

    return min(good.values())

这是O（n），因为in查找使用哈希表，并且不同字符的数量应远小于len(s)。

我的算法的时间复杂度计算

3 个答案: