Question

例如，假设我有类似的字符串：

duck duck duck duck goose goose goose dog

我希望它尽可能地人口稀少，比如说

duck goose duck goose dog duck goose duck

您会推荐哪种算法？代码片段或一般指针是有用的，语言欢迎Python，C ++和额外的荣誉，如果你有办法在bash中做。

Answer 1

我会根据重复数量对数组进行排序，从最重复的元素开始，将这些元素尽可能地分开

在你的例子中，鸭子重复了4次，所以对于从0到3的n，鸭子将被放置在n * 8/4的位置。

然后将下一个最重复的一个（鹅）放在位置n * 8/3 + 1中，从n到0（包括0和2），如果已经放置了某些东西，只需将它放在下一个空位。等等

Answer 2

我认为这样的事情是一般的想法：

L = "duck duck duck duck goose goose goose dog ".split() 

from itertools import cycle, islice, groupby

# from: http://docs.python.org/library/itertools.html#recipes
def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).next for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

groups = [list(it) for k,it in groupby(sorted(L))]

# some extra print so you get the idea
print L
print groups
print list(roundrobin(*groups))

输出：

['dog', 'duck', 'duck', 'duck', 'duck', 'goose', 'goose', 'goose']
[['dog'], ['duck', 'duck', 'duck', 'duck'], ['goose', 'goose', 'goose']]
['dog', 'duck', 'goose', 'duck', 'goose', 'duck', 'goose', 'duck']

所以你想要某种循环： - ）

嗯，循环赛并不完美。

这是你想到的蛮力（又名非常低效）的版本。

# this is the function we want to maximize
def space_sum( L ):
    """ return the sum of all spaces between all elements in L"""
    unique = set(L)
    def space(val):
        """ count how many elements are between two val """
        c = 0
        # start with the first occurrence of val, then count
        for x in L[1+L.index(val):]: 
            if x==val:
                yield c
                c = 0
            else:
                c += 1
    return sum(sum(space(val)) for val in unique)

print max((space_sum(v), v) for v in permutations(L))

# there are tons of equally good solutions
print sorted(permutations(L), key=space_sum, reverse=True)[:100]

Answer 3

实际上如何衡量稀疏度？顺便说一下，一个简单的random shuffle可能会起作用。

Answer 4

按计数对类型进行排序。

项目类型1放置在链接列表中。（存储中间链接）。
下一个项目类型count = c总当前列表大小= N. 使用列表中间的“银行家四舍五入”在c中分配第2项。
转到2。

Answer 5

上面有关于排序和分离最常见的字符串的答案。但是如果您有太多数据无法排序或不想花时间，请查看quasirandom数字（http://mathworld.wolfram.com/QuasirandomSequence.html）。在Numerical Recipes一书中有一个简单的实现。这些是“看起来”随机的数字，即填充空间但尽量避免彼此避免。它在你想要“随机”采样某些东西的应用中使用了很多，而不是真正的随机，你想要有效地采样整个空间。

Answer 6

如果我理解你的“稀疏”定义，这个功能应该是你想要的：

# python ≥ 2.5
import itertools, heapq

def make_sparse(sequence):
    grouped= sorted(sequence)
    item_counts= []
    for item, item_seq in itertools.groupby(grouped):
        count= max(enumerate(item_seq))[0] + 1
        item_counts.append( (-count, item) ) # negative count for heapq purposes
    heapq.heapify(item_counts)

    count1, item1= heapq.heappop(item_counts)
    yield item1; count1+= 1
    while True:
        try:
            count2, item2= heapq.heappop(item_counts)
        except IndexError: # no other item remains
            break
        yield item2; count2+= 1
        if count1 < 0:
            heapq.heappush(item_counts, (count1, item1))
        item1, count1= item2, count2

    # loop is done, produce remaining item1 items
    while count1 < 0:
        yield item1; count1+= 1

if __name__ == "__main__":
    # initial example
    print list(make_sparse(
        "duck duck duck duck goose goose goose dog".split()))
    # updated example
    print list(make_sparse([
        'duck', 'duck', 'duck', 'duck', 'duck', 'duck',
        'goose', 'goose', 'goose', 'goose', 'dog', 'dog']))
    # now a hard case: item 'a' appears more than:
    # > total_len//2 times if total_len is even
    # > total_len//2+1 times if total_len is odd
    print list(make_sparse("aaaaaabbcc"))

这些示例产生了这个输出：

['duck', 'goose', 'duck', 'goose', 'duck', 'dog', 'duck', 'goose']
['duck', 'goose', 'duck', 'goose', 'duck', 'dog', 'duck', 'goose', 'duck', 'dog', 'duck', 'goose']
['a', 'b', 'a', 'c', 'a', 'b', 'a', 'c', 'a', 'a']

一个微妙的说明：在第一个和第二个示例中，撤消输出顺序可能看起来更优化。

对数组中的字符串进行排序，使其稀疏填充

6 个答案: