Question

我想生成1＆0和0的随机字符串（或数组）。然后我根据1的数量（计数）对它们进行分类。我希望生成的字符串均匀分布在可能的计数中。

但是下面的代码给出了正态分布：

import numpy as np

for i in range(num_examples):
    seq = np.random.randint(2, size=(seq_length)).astype('float32')
    sequences[i] = seq

target_classes = []
for input in sequences: 
    target = (input == 1).sum()
    target_classes.append(target)

计数的直方图是：

NumPy解决方案非常棒。或者我需要正则表达式还是别的什么？

Answer 1

正如@Prune已经指出这基本上是一个两步过程。首先，您需要创建“一个”的统一分布（例如，使用np.random.randint），然后您需要将多个“seq”元素设置为一个（例如使用np.random.choice）。

一种可能性是：

import numpy as np

NUM_EXAMPLES = 10000
SEQ_LENGTH = 10

sequences = np.zeros((NUM_EXAMPLES, SEQ_LENGTH), dtype=np.int8)
# How many number of ones in each sequence
number_of_1s = np.random.randint(0, SEQ_LENGTH+1, size=NUM_EXAMPLES)

indices = np.arange(SEQ_LENGTH)
for idx, num_ones in enumerate(number_of_1s.tolist()):
    # Set "num_ones" elements to 1 using "choice" without replace.
    sequences[idx][np.random.choice(indices, num_ones, replace=False)] = 1

使用直方图显示它看起来分布均匀：

plt.hist(np.sum(sequences==1, axis=1), bins=np.arange(SEQ_LENGTH+2)-0.5, histtype='step')

Answer 2

如果您希望平均分配1的数量，那么我认为您会发现首先生成数量最简单，然后通过二进制表示随机分配许多1。这是一个两步过程，几乎是必要的。

有了这个提示，你可以自己编码吗？

在Python中均匀分布的随机字符串

2 个答案: