Question

我正在尝试填充Python3中的列表，其中使用REGEX从文件中读取3个随机项，但我不断在列表中获取重复项。这是一个例子。

import re
import random as rn

data = '/root/Desktop/Selenium[FILTERED].log'
with open(data, 'r') as inFile:
    index = inFile.read()
    URLS = re.findall(r'https://www\.\w{1,10}\.com/view\?i=\w{1,20}', index)

    list_0 = []
    for i in range(3):
        list_0.append(URLS[rn.randint(1, 30)])
    inFile.close()

for i in range(len(list_0)):
    print(list_0[i])

防止重复项目附加到列表的最简洁方法是什么？

（修改的）这是我认为完成这项工作的代码。

def random_sample(data):
    r_e = ['https://www\.\w{1,10}\.com/view\?i=\w{1,20}', '..']
    with open(data, 'r') as inFile:
        urls = re.findall(r'%s' % r_e[0], inFile.read())
        x = list(set(urls))
        inFile.close()
    return x

data = '/root/Desktop/[TEMP].log'
sample = random_sample(data)
for i in range(3):
    print(sample[i])

没有重复条目的无序集合。

Answer 1

使用内置random.sample。

random.sample(population, k)
    Return a k length list of unique elements chosen from the population sequence or set.
    Used for random sampling without replacement.

附录

看到你的编辑后，看起来你做的事情比以前要难得多。我在下面列出了URLS的列表，但来源并不重要。选择（保证唯一）子集本质上是一个带有random.sample：

的单行

import random

# the following two lines are easily replaced
URLS = ['url1', 'url2', 'url3', 'url4', 'url5', 'url6', 'url7', 'url8']
SUBSET_SIZE = 3

# the following one-liner yields the randomized subset as a list
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
print(urlList)    # produces, e.g., => ['url7', 'url3', 'url4']

请注意，通过使用len(URLS)和SUBSET_SIZE，执行工作的单线程不会硬连接到集合的大小，也不会硬连接到所需的子集大小。

附录2

如果原始输入列表包含重复值，则以下稍作修改将为您解决问题：

URLS = list(set(URLS))  # this converts to a set for uniqueness, then back for indexing
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]

甚至更好，因为它不需要两次转换：

URLS = set(URLS)
urlList = [u for u in random.sample(URLS, SUBSET_SIZE)]

Answer 2

seen = set(list_0)
randValue = URLS[rn.randint(1, 30)]

# [...]

if randValue not in seen:
  seen.add(randValue)
  list_0.append(randValue)

现在你只需要检查list_0 size是否等于3来停止循环。

如何在python中使用random.randint（）获取3个唯一值？

2 个答案:

附录

附录2