在Python中,一组计数是否为缓冲区?

时间:2016-12-27 20:34:05

标签: python data-structures

我正在通过Cracking the Coding Interview(第4版),其中一个问题如下:

  

设计算法并编写代码以删除字符串中的重复字符   不使用任何额外的缓冲区。注意:一个或两个额外的变量是好的。   数组的额外副本不是。

我编写了以下解决方案,它满足了作者指定的所有测试用例:

def remove_duplicate(s):
    return ''.join(sorted(set(s)))

print(remove_duplicate("abcd")) // output "abcd"
print(remove_duplicate("aaaa")) // output "a"
print(remove_duplicate("")) // output ""
print(remove_duplicate("aabb")) // output "ab"

我在我的解决方案中使用一组是否算作使用额外的缓冲区,或者我的解决方案是否合适?如果我的解决方案不充分,那么更好的方法是什么呢?

非常感谢!

2 个答案:

答案 0 :(得分:0)

只有管理问题或评估答案的人才能肯定地说,但我会说一个集合确实算作缓冲。

如果字符串中没有重复的字符,则集合的长度将等于字符串的长度。实际上,由于一个集合具有很大的开销,因为它在哈希列表上工作,所以该集合可能比字符串更多地采用更多内存。如果字符串包含Unicode,则唯一字符的数量可能非常大。

如果您不知道字符串中有多少个唯一字符,您将无法预测该字符集的长度。可能长且可能不可预测的集合长度使其被视为缓冲区 - 或者更糟糕的是,考虑到可能比字符串更长的长度。

答案 1 :(得分:0)

为了跟进v.coder的评论,我重写了他(或她)在Python中引用的代码,并添加了一些注释以试图解释发生了什么。

def removeduplicates(s):
    """Original java implementation by
          Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
       in his/her answer
          http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
      """
    # python strings are immutable, so first converting the string to a list of integers,
    # each integer representing the ascii value of the letter
    # (hint: look up "ascii table" on the web)
    L = [ord(char) for char in s]

    # easiest solution is to use a set, but to use Druv Gairola's method...
    # (hint, look up "bitmaps" on the web to learn more!)
    bitmap = 0
    #seen = set()

    for index, char in enumerate(L):
        # first check for duplicates:
        # number of bits to shift left (the space is the "lowest"
        # character on the ascii table, and 'char' here is the position
        # of the current character in the ascii table. so if 'char' is
        # a space, the shift length will be 0, if 'char' is '!', shift
        # length will be 1, and so on. This naturally requires the
        # integer to actually have as many "bit positions" as there are
        # characters in the ascii table from the space to the ~,
        # but python uses "very big integers" (BigNums? I am not really
        # sure here..) - so that's probably going to be fine..
        shift_length = char - ord(' ')

        # make a new integer where only one bit is set;
        # the bit position the character corresponds to
        bit_position = 1 << shift_length

        # if the same bit is already set [to 1] in the bitmap,
        # the result of AND'ing the two integers together
        # will be an integer where that only that exact bit is
        # set - but that still means that the integer will be greater
        # than zero. (assuming that the so-called "sign bit" of the
        # integer doesn't get set. Again, I am not entirely sure about
        # how python handles integers this big internally.. but it
        # seems to work fine...)
        bit_position_already_occupied = bitmap & bit_position > 0

        if bit_position_already_occupied:
        #if char in seen:
            L[index] = 0
        else:
            # update the bitmap to indicate that this character
            # is now seen.
            # so, same procedure as above. first find the bit position
            # this character represents...
            bit_position = char - ord(' ')

            # make an integer that has a single bit set:
            # the bit that corresponds to the position of the character
            integer = 1 << bit_position

            # "add" the bit to the bitmap. The way we do this is that
            # we OR the current bitmap with the integer that has the
            # required bit set to 1. The result of OR'ing two integers
            # is that all bits that are set to 1 in *either* of the two
            # will be set to 1 in the result.

            bitmap = bitmap | integer
            #seen.add(char)

    # finally, turn the list back to a string to be able to return it
    # (again, just kind of a way to "get around" immutable python strings)
    return ''.join(chr(i) for i in L if i != 0)


if __name__ == "__main__":
    print(removeduplicates('aaaa'))
    print(removeduplicates('aabcdee'))
    print(removeduplicates('aabbccddeeefffff'))
    print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][   ao8faeo~~~````%!)"%fakfzzqqfaklnz'))