找到像aabccbdcbe
这样的字符串的第一个非重复字符的最佳空间和时间效率解决方案是什么?
这里的答案是d。因此,令我印象深刻的是它可以通过两种方式完成:
答案 0 :(得分:17)
这是一个非常简单的O(n)
解决方案:
def fn(s):
order = []
counts = {}
for x in s:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
order.append(x)
for x in order:
if counts[x] == 1:
return x
return None
我们遍历字符串一次。当我们遇到一个新角色时,我们会将其存储在counts
中,其值为1
,并将其附加到order
。当我们遇到一个我们以前见过的角色时,我们会在counts
中增加它的值。最后,我们遍历order
,直到我们在1
中找到值为counts
的字符并将其返回。
答案 1 :(得分:7)
我认为从字符串中删除重复字符可能会显着减少操作次数。例如:
s = "aabccbdcbe"
while s != "":
slen0 = len(s)
ch = s[0]
s = s.replace(ch, "")
slen1 = len(s)
if slen1 == slen0-1:
print ch
break;
else:
print "No answer"
答案 2 :(得分:6)
如果字符仅出现一次,则列表理解将按照它们出现的顺序为您提供字符:
In [61]: s = 'aabccbdcbe'
In [62]: [a for a in s if s.count(a) == 1]
Out[62]: ['d', 'e']
然后返回第一个条目:
In [63]: [a for a in s if s.count(a) == 1][0]
Out[63]: 'd'
如果您只需要第一个条目,那么生成器也可以正常工作:
In [69]: (a for a in s if s.count(a) == 1).next()
Out[69]: 'd'
答案 3 :(得分:4)
搜索的速度取决于几个因素:
在下面的代码中,我首先定义一个字符串s
在random.choice()
的帮助下,以及一组名为unik
的一次性字符,
我连接的两个字符串s1
和s2
:s1 + s2
其中:
s1
是一个长度为nwo
的字符串,其中没有任何一次性字符s2
是一个长度为nwi
的字符串,其中有一次性字符
#### creation of s from s1 and s2 #########
from random import choice
def without(u,n):
letters = list('abcdefghijklmnopqrstuvwxyz')
for i in xrange(n):
c = choice(letters)
if c not in unik:
yield c
def with_un(u,n):
letters = list('abcdefghijklmnopqrstuvwxyz')
ecr = []
for i in xrange(n):
c = choice(letters)
#ecr.append('%d %s len(letters) == %d' % (i,c,len(letters)))
yield c
if c in unik:
letters.remove(c)
#print '\n'.join(ecr)
unik = 'ekprw'
nwo,nwi = 0,500
s1 = ''.join(c for c in without(unik,nwo))
s2 = ''.join(c for c in with_un(unik,nwi))
s = s1 + s2
if s1:
print '%-27ss2 : %d chars' % ('s1 : %d chars' % len(s1),len(s2))
for el in 'ekprw':
print ('s1.count(%s) == %-12ds2.count(%s) == %d'
% (el,s1.count(el),el,s2.count(el)))
others = [c for c in 'abcdefghijklmnopqrstuvwxyz' if c not in unik]
print 's1.count(others)>1 %s' % all(s1.count(c)>1 for c in others)
else:
print "s1 == '' len(s2) == %d" % len(s2)
for el in 'ekprw':
print (' - s2.count(%s) == %d'
% (el,s2.count(el)))
print 'len of s == %d\n' % len(s)
然后是基准测试
改变数字nwo
和nwi
,我们看到了对速度的影响:
### benchmark of three solutions #################
from time import clock
# Janne Karila
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
te = clock()
c = OrderedCounter(s)
rjk = (item for item, count in c.iteritems() if count == 1).next()
tf = clock()-te
print 'Janne Karila %.5f found: %s' % (tf,rjk)
# eyquem
te = clock()
candidates = set(s)
li = []
for x in s:
if x in candidates:
li.append(x)
candidates.remove(x)
elif x in li:
li.remove(x)
rey = li[0]
tf = clock()-te
print 'eyquem %.5f found: %s' % (tf,rey)
# TyrantWave
te = clock()
rty = (a for a in s if s.count(a) == 1).next()
tf = clock()-te
print 'TyrantWave %.5f found: %s' % (tf,rty)
一些结果
s1
的空长度,nwo = 0,nwi = 50:
s1 == '' len(s2) == 50
- s2.count(e) == 1
- s2.count(k) == 1
- s2.count(p) == 1
- s2.count(r) == 1
- s2.count(w) == 1
len of s == 50
Janne Karila 0.00077 found: e
eyquem 0.00013 found: e
TyrantWave 0.00005 found: e
TyrantWave的解决方案更快,因为在字符串的第一个位置快速找到第一个发生的字符
nwo = 300且nwi = 50
(此后为s1
的401个字符,因为s1
的构造过程中未保留一次性字符的出现,请参阅函数without())
s1 : 245 chars s2 : 50 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 1
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 295
Janne Karila 0.00167 found: e
eyquem 0.00030 found: e
TyrantWave 0.00042 found: e
这次TyrantWave的解决方案比我的解决方案更长,因为它必须计算s
第一部分中所有字符的出现次数,即s1
中没有一次性的字符 - 出现的字符(它们位于第二部分s2
)
但是,要使用我的解决方案获得更短的时间,nwo
需要明显大于nwi
nwo = 300且nwi = 5000
s1 : 240 chars s2 : 5000 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 1
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 5240
Janne Karila 0.01510 found: p
eyquem 0.00534 found: p
TyrantWave 0.00294 found: p
如果s2
的长度增加,那么TyrantWave的解决方案会更好。
结束你想要的东西
罗马的好主意!
我在我的基准测试中添加了Roman的解决方案,它赢了!
我也做了一些微小的修改,以改善他的解决方案。
# Roman Fursenko
srf = s[:]
te = clock()
while srf != "":
slen0 = len(srf)
ch = srf[0]
srf = srf.replace(ch, "")
slen1 = len(srf)
if slen1 == slen0-1:
rrf = ch
break
else:
rrf = "No answer"
tf = clock()-te
print 'Roman Fursenko %.6f found: %s' % (tf,rrf)
# Roman Fursenko improved
srf = s[:]
te = clock()
while not(srf is ""):
slen0 = len(srf)
srf = srf.replace(srf[0], "")
if len(srf) == slen0-1:
rrf = ch
break
else:
rrf = "No answer"
tf = clock()-te
print 'Roman improved %.6f found: %s' % (tf,rrf)
print '\nindex of %s in the string : %d' % (rty,s.index(rrf))
结果是:
s1 == '' len(s2) == 50
- s2.count(e) == 1
- s2.count(k) == 1
- s2.count(p) == 1
- s2.count(r) == 1
- s2.count(w) == 1
len of s == 50
Janne Karila 0.0032538 found: r
eyquem 0.0001249 found: r
TyrantWave 0.0000534 found: r
Roman Fursenko 0.0000299 found: r
Roman improved 0.0000263 found: r
index of r in the string : 1
s1 == '' len(s2) == 50
- s2.count(e) == 1
- s2.count(k) == 0
- s2.count(p) == 1
- s2.count(r) == 1
- s2.count(w) == 1
len of s == 50
Janne Karila 0.0008183 found: a
eyquem 0.0001285 found: a
TyrantWave 0.0000550 found: a
Roman Fursenko 0.0000433 found: a
Roman improved 0.0000391 found: a
index of a in the string : 4
>
s1 : 240 chars s2 : 50 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 0
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 290
Janne Karila 0.0016390 found: e
eyquem 0.0002956 found: e
TyrantWave 0.0004112 found: e
Roman Fursenko 0.0001428 found: e
Roman improved 0.0001277 found: e
index of e in the string : 242
s1 : 241 chars s2 : 5000 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 1
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 5241
Janne Karila 0.0148231 found: r
eyquem 0.0053283 found: r
TyrantWave 0.0030166 found: r
Roman Fursenko 0.0007414 found: r
Roman improved 0.0007230 found: r
index of r in the string : 250
由于罗马的代码,我学到了一些东西:
s.replace()
创建了一个新字符串,我认为,因此,这是一种缓慢的方法
但是,我不知道为什么这是一个非常快的方法。
Oin的解决方案最糟糕:
# Oin
from operator import itemgetter
seen = set()
only_appear_once = dict()
te = clock()
for i, x in enumerate(s):
if x in seen and x in only_appear_once:
only_appear_once.pop(x)
else:
seen.add(x)
only_appear_once[x] = i
fco = min(only_appear_once.items(),key=itemgetter(1))[0]
tf = clock()-te
print 'Oin %.7f found: %s' % (tf,fco)
结果
s1 == '' len(s2) == 50
Oin 0.0007124 found: e
Janne Karila 0.0008057 found: e
eyquem 0.0001252 found: e
TyrantWave 0.0000712 found: e
Roman Fursenko 0.0000335 found: e
Roman improved 0.0000335 found: e
index of e in the string : 2
s1 : 237 chars s2 : 50 chars
Oin 0.0029783 found: k
Janne Karila 0.0014714 found: k
eyquem 0.0002889 found: k
TyrantWave 0.0005598 found: k
Roman Fursenko 0.0001458 found: k
Roman improved 0.0001372 found: k
index of k in the string : 246
s1 : 236 chars s2 : 5000 chars
Oin 0.0801739 found: e
Janne Karila 0.0155715 found: e
eyquem 0.0044623 found: e
TyrantWave 0.0027548 found: e
Roman Fursenko 0.0007255 found: e
Roman improved 0.0007199 found: e
index of e in the string : 244
答案 4 :(得分:2)
collections.Counter
有效计算(*),collections.OrderedDict
会记住第一次看到项目的顺序。让我们使用多重继承来结合好处:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
def first_unique(iterable):
c = OrderedCounter(iterable)
for item, count in c.iteritems():
if count == 1:
return item
print first_unique('aabccbdcbe')
#d
print first_unique('abccbdcbe')
#a
Counter
使用其超类dict
来存储计数。按方法解析顺序在class OrderedCounter(Counter, OrderedDict)
和OrderedDict
之间定义Counter
dict
,并添加记住广告订单的功能。
(*)这是O(n)并且在这个意义上是有效的,但不是最快的解决方案,正如基准显示的那样。
答案 5 :(得分:0)
以下是使用good
个字符集和bad
个字符集(多次显示)的方法:
import timeit
import collections
import operator
import random
s = [chr(i) for i in range(ord('a'), ord('z')) for j in range(100)] + ['z']
random.shuffle(s)
s = ''.join(s)
def good_bad_sets(s):
setbad = set()
setgood = set()
for char in s:
if(char not in setbad):
if(char in setgood):
setgood.remove(char)
setbad.add(char)
else:
setgood.add(char)
return s[min([s.index(char) for char in setgood])] if len(s) > 0 else None
def app_once(s):
seen = set()
only_appear_once = set()
for i in s:
if i in seen:
only_appear_once.discard(i)
else:
seen.add(i)
only_appear_once.add(i)
return s[min([s.index(char) for char in only_appear_once])] if len(only_appear_once) > 0 else None
print('Good bad sets: %ss' % timeit.Timer(lambda : good_bad_sets(s)).timeit(100))
print('Oin\'s approach: %ss' % timeit.Timer(lambda : app_once(s)).timeit(100))
print('LC: %ss' % timeit.Timer(lambda : [a for a in s if s.count(a) == 1][0]).timeit(100))
我将它与LC方法进行了比较,大约50个字符,good
和bad
设置方法变得更快。这种方法与Oin's与LC的比较:
Good bad sets: 0.0419239997864s
Oin's approach: 0.0803039073944s
LC: 0.647999048233s
答案 6 :(得分:-1)
因此从问题的定义来看,很明显你需要一个O(n)解决方案,这意味着只需要通过一次列表。所有使用计数形式的解决方案都是错误的,因为它们在该操作中再次通过列表。所以你需要自己跟踪计数。
如果你只有字符串中的字符,那么你不需要担心存储,你可以只使用字符作为字典中的键。该dict中的值将是字符串s中字符的索引。最后,我们必须通过计算字典值的最小值来查看哪一个是第一个。这是一个(可能)比第一个更短的列表上的O(n)操作。
总数仍为O(c * n)因此为O(n)。
from operator import itemgetter
seen = set()
only_appear_once = dict()
for i, x in enumerate(s):
if x in seen and x in only_appear_once:
only_appear_once.pop(x)
else:
seen.add(x)
only_appear_once[x] = i
first_count_of_one = only_appear_once[min(only_appear_once.values(), key=itemgetter(1))]