我正在通过Cracking the Coding Interview(第4版),其中一个问题如下:
设计算法并编写代码以删除字符串中的重复字符 不使用任何额外的缓冲区。注意:一个或两个额外的变量是好的。 数组的额外副本不是。
我编写了以下解决方案,它满足了作者指定的所有测试用例:
def remove_duplicate(s):
return ''.join(sorted(set(s)))
print(remove_duplicate("abcd")) // output "abcd"
print(remove_duplicate("aaaa")) // output "a"
print(remove_duplicate("")) // output ""
print(remove_duplicate("aabb")) // output "ab"
我在我的解决方案中使用一组是否算作使用额外的缓冲区,或者我的解决方案是否合适?如果我的解决方案不充分,那么更好的方法是什么呢?
非常感谢!
答案 0 :(得分:0)
只有管理问题或评估答案的人才能肯定地说,但我会说一个集合确实算作缓冲。
如果字符串中没有重复的字符,则集合的长度将等于字符串的长度。实际上,由于一个集合具有很大的开销,因为它在哈希列表上工作,所以该集合可能比字符串更多地采用更多内存。如果字符串包含Unicode,则唯一字符的数量可能非常大。
如果您不知道字符串中有多少个唯一字符,您将无法预测该字符集的长度。可能长且可能不可预测的集合长度使其被视为缓冲区 - 或者更糟糕的是,考虑到可能比字符串更长的长度。
答案 1 :(得分:0)
为了跟进v.coder的评论,我重写了他(或她)在Python中引用的代码,并添加了一些注释以试图解释发生了什么。
def removeduplicates(s):
"""Original java implementation by
Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
in his/her answer
http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
"""
# python strings are immutable, so first converting the string to a list of integers,
# each integer representing the ascii value of the letter
# (hint: look up "ascii table" on the web)
L = [ord(char) for char in s]
# easiest solution is to use a set, but to use Druv Gairola's method...
# (hint, look up "bitmaps" on the web to learn more!)
bitmap = 0
#seen = set()
for index, char in enumerate(L):
# first check for duplicates:
# number of bits to shift left (the space is the "lowest"
# character on the ascii table, and 'char' here is the position
# of the current character in the ascii table. so if 'char' is
# a space, the shift length will be 0, if 'char' is '!', shift
# length will be 1, and so on. This naturally requires the
# integer to actually have as many "bit positions" as there are
# characters in the ascii table from the space to the ~,
# but python uses "very big integers" (BigNums? I am not really
# sure here..) - so that's probably going to be fine..
shift_length = char - ord(' ')
# make a new integer where only one bit is set;
# the bit position the character corresponds to
bit_position = 1 << shift_length
# if the same bit is already set [to 1] in the bitmap,
# the result of AND'ing the two integers together
# will be an integer where that only that exact bit is
# set - but that still means that the integer will be greater
# than zero. (assuming that the so-called "sign bit" of the
# integer doesn't get set. Again, I am not entirely sure about
# how python handles integers this big internally.. but it
# seems to work fine...)
bit_position_already_occupied = bitmap & bit_position > 0
if bit_position_already_occupied:
#if char in seen:
L[index] = 0
else:
# update the bitmap to indicate that this character
# is now seen.
# so, same procedure as above. first find the bit position
# this character represents...
bit_position = char - ord(' ')
# make an integer that has a single bit set:
# the bit that corresponds to the position of the character
integer = 1 << bit_position
# "add" the bit to the bitmap. The way we do this is that
# we OR the current bitmap with the integer that has the
# required bit set to 1. The result of OR'ing two integers
# is that all bits that are set to 1 in *either* of the two
# will be set to 1 in the result.
bitmap = bitmap | integer
#seen.add(char)
# finally, turn the list back to a string to be able to return it
# (again, just kind of a way to "get around" immutable python strings)
return ''.join(chr(i) for i in L if i != 0)
if __name__ == "__main__":
print(removeduplicates('aaaa'))
print(removeduplicates('aabcdee'))
print(removeduplicates('aabbccddeeefffff'))
print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][ ao8faeo~~~````%!)"%fakfzzqqfaklnz'))