混合两个项目列表,使结果看起来自然而不是人为

时间:2013-06-06 16:16:35

标签: algorithm sorting

我有两个不同类别的项目列表,比如A& B,有m A和n B。我想将两个列表混合到一个列表中,因此结果保持A的顺序和B的顺序,但是它们以一种看似不人为的方式组合起来。

如果m和n类似于一个愚蠢的版本将交替A B A B但看起来不自然。像A A B A B A B B A A等东西看起来不那么假。在大多数情况下,A比B更多,但不能保证。通常有125 A和50 B,但从不多,但可以过滤到1。

我已经建立了一个基于m / n比率但当然非常规律的。我试图在其中添加一些随机元素但仍然看起来不太正确。

正确的外观显然是主观的,显然如果有一个可靠的统计基础,代码会更容易编写。欢迎任何想法。即使告诉我谷歌中的正确搜索条件也会有所帮助,如果有数学或统计学的分支可以做到这样的话。

在Objective-C中写这个,但我不需要代码,只需要算法或想法。

更新:我调查了所提出的各种事情,但有些事情太复杂了,特别是像Sobol序列这样的事情。我现在正在做的是使用随机算法(将总A和B加在一起,从0到总数1选择随机int,如果少于总A选择A)但我添加了一个检查以确保不超过2 B连续出现(因为B计数实际上总是小于As的一半)。还不完美,但看起来确实不那么随意。最终会结束多余的B,但从商业角度来看,这些都不太可取。 Sobol等人都会确保更好的混合,但这对此来说太过分了。

3 个答案:

答案 0 :(得分:3)

鉴于 m A和 n B:

while (m + n > 0) {
  float r = a random number in the range 0..1;
  if (r < m / (m + n)) {  // use floating point arithmetic
    choose the next A;
    --m;
  } else {
    choose the next B;
    --n;
  }
}

答案 1 :(得分:0)

一种方法是从具有指定确定性自动机接受的正确字母计数的单词中随机均匀地采样。该算法是关于自动机状态和剩余符号数的动态程序。这里有一些20 a和20 b的样本输出:

abbaabbbaabbbabaaabbaaababbaaabbbabbbaaa
bbaaababababbaaabbababababbbabaaababaabb
bbababbbaaabaaabbbabaabaaabbbaababbababa
ababbbabbababbbaabababbaababaabbaaababaa
bbaaababbababbabaabbababaabababaabababba
bbaabbababbbaabbababaaabaababbbaababaaab
babaabaabbababbababbababbaababaaababbaba
aaabababaababbabbababbbaabbababaabbaaabb
babababbabaaababababababaababbbaabbaabba
bbabaabababababbabaababaababbbaabbabaaba

这是制作这些内容的Python。

from collections import namedtuple
from itertools import product, repeat
from random import random


"""
deterministic finite automata
delta is a dict from state-symbol pairs to states
q0 is the initial state
F is the set of accepting states
"""
DFA = namedtuple('DFA', ('delta', 'q0', 'F'))


"""accepts strings with no runs of length 4"""
noruns4 = DFA(
    delta={
        ('0', 'a'): '1a',
        ('0', 'b'): '1b',
        ('1a', 'a'): '2a',
        ('1a', 'b'): '1b',
        ('1b', 'a'): '1a',
        ('1b', 'b'): '2b',
        ('2a', 'a'): '3a',
        ('2a', 'b'): '1b',
        ('2b', 'a'): '1a',
        ('2b', 'b'): '3b',
        ('3a', 'a'): '4',
        ('3a', 'b'): '1b',
        ('3b', 'a'): '1a',
        ('3b', 'b'): '4',
        ('4', 'a'): '4',
        ('4', 'b'): '4'},
    q0='0',
    F={'0', '1a', '1b', '2a', '2b', '3a', '3b'})


def accepts(dfa, s):
    """returns whether dfa accepts s"""
    q = dfa.q0
    for c in s:
        q = dfa.delta[(q, c)]
    return q in dfa.F


def testaccepts():
    for n in range(10):
        for cs in product(*repeat('ab', n)):
            s = ''.join(cs)
            if not accepts(noruns4, s) != ('aaaa' in s or 'bbbb' in s):
                print(s)
                assert False


testaccepts()


def acceptedstrcnts(dfa, syms, cnts, memo=None, q=None):
    """
    counts the number of strings accepted by dfa,
    subject to the constraint of having the specified number of symbols
    """
    if memo is None:
        memo = {}
    if q is None:
        q = dfa.q0
    key = (q,) + tuple(cnts)
    if key not in memo:
        if sum(cnts) > 0:
            total = 0
            for (i, cnt) in enumerate(cnts):
                if cnt > 0:
                    newcnts = list(cnts)
                    newcnts[i] -= 1
                    newq = dfa.delta[(q, syms[i])]
                    total += acceptedstrcnts(dfa, syms, newcnts, memo, newq)
        else:
            total = 1.0 if q in dfa.F else 0.0
        memo[key] = total
    return memo[key]


print(acceptedstrcnts(noruns4, 'ab', (125, 50)))
memo = {}
acceptedstrcnts(noruns4, 'ab', (4, 4), memo)
# 62 strings with 4 a's, 4 b's, and no runs
print(memo)


def memoget(memo, q, cnts):
    return memo[(q,) + tuple(cnts)]


def samplestrcnts(dfa, syms, cnts, memo):
    """
    uses the memoization dict to sample the counted words
    modulo roundoff error, the sampling is uniform
    """
    cnts = list(cnts)
    cs = []
    q = dfa.q0
    while sum(cnts) > 0:
        denom = memoget(memo, q, cnts)
        outcome = random()
        j = None
        for (i, cnt) in enumerate(cnts):
            if cnt > 0:
                j = i  # default in case roundoff bites us
                newcnts = list(cnts)
                newcnts[i] -= 1
                newq = dfa.delta[(q, syms[i])]
                numer = memoget(memo, newq, newcnts)
                ratio = numer / denom
                if outcome < ratio:
                    break
                outcome -= ratio
        cnts[j] -= 1
        cs.append(syms[j])
        q = dfa.delta[(q, syms[j])]
    return ''.join(cs)


acceptedstrcnts(noruns4, 'ab', (20, 20), memo)
for k in range(10):
    print(samplestrcnts(noruns4, 'ab', (20, 20), memo))

答案 2 :(得分:0)

这是另一种基于Metropolis-Hastings的方法。

from math import log2
from random import randrange


def simscore(lst, j):
    score = 0
    if j > 0 and lst[j] == lst[j - 1]:
        score += 1
    if j < len(lst) - 1 and lst[j] == lst[j + 1]:
        score += 1
    return score


def mix(lst):
    n = len(lst)
    for i in range(len(lst) * (100 + round(log2(n + 1)))):
        j = randrange(n)
        k = randrange(n)
        oldscore = simscore(lst, j) + simscore(lst, k)
        (lst[j], lst[k]) = (lst[k], lst[j])
        newscore = simscore(lst, j) + simscore(lst, k)
        if not (newscore <= oldscore or randrange(4 ** (newscore - oldscore)) == 0):
            (lst[j], lst[k]) = (lst[k], lst[j])


lst = list(125 * 'a' + 50 * 'b')
for i in range(10):
    mix(lst)
    print(''.join(lst))

以下是一些示例:

ababababaaababaabaabbabaabaaaaabaaabaababaaaabababaabaababaaabaaabaaabaabaababaaaababaaabaaaaaaabaaabaaaaaaaaabaabaabaaaababaaaaaababababaaabaabaabaaababaabaabaaabaaaaaaaabaaa
aaaaaaabababaaaaabaaabaaabaabaaaaaababaaaabaaaabaaaaaabaaabababaaabaaaaaaabbaababaabaabababaabababaababaaabaababaaaaabaabaaaaaaaabaabaaaababaabaaaaaababaaabababbababababaabaaa
ababababaabaaabbababaaababbaaaabaabaaaabaabaaaababaabababaaababaaaabaaabaaaaaaabaaaabaaababaaaaaaaababaaaabaaababaaaaabaaaabaaaababaabaababaaabaaaaababaababaaaaabaabaabaabaaaa
aaaaaababababaaaaaabaaaabaabaaabababaaabaaaabaaababaabaaaaaaaababaababaaaaabaaabaababaaaaabaaaabababaaaababaabababababbaaabaaaaabbaaaaaabababbaaabaabaaabaaaaaabbaaaaaabaaababa
ababaababaaababababaabaaaaaaabaababaabaaaaaaaaabaabaabaababaabaababababaabaabababababaaabaabababaaaaaaabaabaaaabababaaaaaaaabaaaaaaaabaaaaaaaababaaaaabbaaababaaabaaaaaaababaab
baababaabaabaaabababaaaabaabaababaaaababaabaaaaaabababaabaaaaaaaababaaaaabababaaaabaabababababababaaaaaababaaaabaaaaaaabaaabaaabaaaabaabaaaaaababaaaaaaababaababaabababaaaaaaab
aabaabaaaabababaabaababaaaaabaaaaabaabaaaaababaaababaaababaaaaababaaabaaabaaaabaabababababaaaabaabbabaabaabaabaababaabaabaaaabaaababaaabaabaaaaaabababaaaaaaaabaaaaaaabaaabaaab
babaaaaaababbaaaabababaaaaabaaabababbaaaabaabaaababaabababaabaaabaababaaababaaabaaabaabaababaaaaaaaaaabaaaaaababaabaaabaabaababaaabababbaaaaaabaaaaaaabaaaaaaaabaaaaababaabaaba
aabaaabaaaaaabaababaabaaaaaaaaaaaabababaaababaababaababaaabaabaaabaabaabaaaaabaabaaaabaaabaabaabaababaabaabaabaaaaaaabaabbabaaaabaabaabaaaaaabaaababaaaabaaabaaabbababaabaababa
baaaabababaaaabaaababaabaaaababaaaaabaaaaaaabaaabababbaabababaaaabaabaaaaaabaaaabababababbaaabaaaaabaaaaaabaabaaabaaaaaaaaabaababbaabababaaaabaabaabaababaabababaaaaaaabaaabaaa