Question

是否可以在Zipf发布后使用python从列表中选择一个元素？

假设我有一个清单：

objlist = ['Here', 'in', 'the', 'wall', 'why']

到目前为止，我已经看到https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.zipf.html 但我无法弄清楚解决方案。

提前致谢。

Answer 1

要根据实际经验Zipf分布进行选择，首先需要一张英文单词频率表。如果最常见的是100,000，您可以获得一个here。

这是一个pdf，文本更容易处理，所以转换它，在linux上你可以做

pdftotext freq100000.pdf

这将创建一个文本文件freq100000.txt，您可以使用以下小脚本

import re
import numpy as np

record = re.compile('[0-9]+ [0-9]+ [a-z]+')
data = {}
for line in open('freq100000.txt'):
    m = record.match(line.strip())
    if not m is None:
        rank, freq, word = m.group(0).split()
        data[word] = int(rank), int(freq)

def rel_freqs(wlist):
    freqs = np.array([data[word.lower()][1] for word in wlist])
    ps = np.add.accumulate(freqs)
    choice = np.searchsorted(ps, np.random.randint(ps[-1]))
    return choice

rel_freqs(['Here', 'in', 'the', 'wall', 'why'])

函数rel_freqs从列表中随机选择一个单词并返回其索引。绘制单词的概率与其在英语中的出现频率成正比。

Answer 2

只需使用numpy.random.zipf(shape_parameter)的输出作为列表的索引。但是，存在zipf分布未绑定且值可能大于索引的问题。因此，请将其插入try: except:块中当您多次运行代码时，将从列表中绘制不同的值。但是，因为zipf分布是未绑定的而您的列表索引不是，所以它不会完全是zipf分布式。

Saple代码：

objlist = ['Here', 'in', 'the', 'wall', 'why']
index = np.random.zipf([1.2, 1.2])
for idx in index:
    if idx < len(objlist):
        print(objlist[idx])
    else: 
        print "Index {} exceed list".format(idx)

Wikipedia: Zipf Distribution

Answer 3

我希望我不要误解你的要求，这是我的代码：

import random
objlist = ['Here', 'in', 'the', 'wall', 'why']
print random.choice(objlist)

在python中使用类似Zipf的选择从列表中选择元素

3 个答案: