在python中生成随机加权的字符串文件

时间:2016-12-21 07:42:21

标签: python python-2.7 python-3.x random

我尝试使用[' A',' B',' C',' D&#的字符生成字符串39;,' E'],长度为3900,每个角色的概率应为: {' A':0.1, ' B':0.3, ' C':0.3, ' d':0.1, ' E':0.2} 在这个字符串中 我写了以下代码:

from random import random
from bisect import bisect

def weighted_choice(choices):
    values, weights = zip(*choices)
    total = 0
    cum_weights = []
    for w in weights:
        total += w
        cum_weights.append(total)
    x = random() * total
    i = bisect(cum_weights, x)
    return values[i]
string_ = ''
for i in range(0,3900):
    string_ = string_ + weighted_choice([("A",10), ("B",30), ("C",30),("D",10),("E",20)])

with open("rand_file","w") as f:
        f.write(string_)

但它不会根据概率生成字符串(文件)。它生成的概率如下:

C 0.2500264583 
B 0.2499284457 
E 0.1666428313 
D 0.0833782424 
A 0.0833758065 

概率导致for循环每次单独运行,而不考虑以前的结果。

任何帮助请解决这个问题?

4 个答案:

答案 0 :(得分:3)

如果您只是使用列表['A','B','B','B','C','C','C','D','E','E']并随机选择一个项目,您可以完全摆脱代码中的所有加权内容,并且内置加权。

你可以在下面的例子中看到(是的,我不怀疑它可以写得更好,但它只是一个概念验证,而不是生产就绪,纯粹 - as-snow-white code):

from random import random, seed

def choice(lst):
    return lst[int(random() * len(lst))];

seed()

(a, b, c, d, e, t) = (0, 0, 0, 0, 0, 0)

for i in range(1000):
    x = choice('ABBBCCCDEE')
    if (x == 'A'): a += 1
    if (x == 'B'): b += 1
    if (x == 'C'): c += 1
    if (x == 'D'): d += 1
    if (x == 'E'): e += 1
    t += 1

print ("a =", a, "which is", a * 100 / t, "%")
print ("b =", b, "which is", b * 100 / t, "%")
print ("c =", c, "which is", c * 100 / t, "%")
print ("d =", d, "which is", d * 100 / t, "%")
print ("e =", e, "which is", e * 100 / t, "%")

输出匹配(粗略)所需的分布:

a = 101 which is 10.1 %
b = 297 which is 29.7 %
c = 299 which is 29.9 %
d = 102 which is 10.2 %
e = 201 which is 20.1 %

如果你的发行版分为99.9%A和0.1%B,那么这显然会很烦人(它是一个相当长的字符串传递给choice })但这应该足以满足您的分发。

答案 1 :(得分:1)

您可以根据权重生成所有字母,然后随机随机播放它们并最终加入它们。类似的东西:

from random import shuffle
N = 3900 # the string length
doc = {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } #weights
letters = []
for key in doc.keys():
    m = int(doc[key] * N) #generate correct number of letter
    letters.append(list(key * m))

letters = [item for sublist in letters for item in sublist] # flatten the list
shuffle(letters) # shuffle all letters randomly
result = ''.join(letters) # join all letter to make one string

print(len(result))
# 3900

答案 2 :(得分:0)

这实际上与paxdiablo的解决方案相同,除了更一般(对于您的简单示例,他的解决方案更好。+ 1):

import random

choice = [("A",10), ("B",30), ("C",30),("D",10),("E",20)]
choose_from = ''.join(x * letter for letter, x in choice)

print(choose_from)
#  AAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDetc...

print(random.choice(choose_from))

答案 3 :(得分:0)

这是我的解决方案希望它至少有点帮助: import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1

显然你可以添加更多if语句,但我不确定这是否就是你要求的地方