Question

我有一个很大的元素列表（在本例中，我假设它充满了数字）。例如：l = [1,2,3,4,5,6,7,8,9,10]现在我要从该列表中抽取2个样本，一个样本包含80％的元素（当然是随机选择的），另一个样本包含剩余的元素（20％），所以我可以使用较大的机器来训练机器学习工具，其余的可以测试该训练。我使用的函数来自random，而我是这样使用的：

sz = len(l) #Size of the original list
per = int((80 * sz) / 100) #This will be the length of the sample list with the 80% of the elements (I guess)
random.seed(1) # As I want to obtain the same results every time I run it.
l2 = random.sample(l, per)

我不太确定，但是我相信使用该代码可以得到80％数字的随机样本。

l2 = [3,4,7,2,9,5,1,8]

尽管如此，我似乎找不到找到带有其余元素l3 = [6,10]的其他示例列表的方法（sample()函数不会删除其从原始列表中获取的元素）。你能帮我么？预先谢谢你。

Answer 1

对我来说，即使大多数机器学习库都包含如前所述的易于使用的拆分功能，以下代码也可以将列表随机拆分为两个（培训/测试）集：

l = [1,2,3,4,5,6,7,8,9,10]
sz = len(l)
cut = int(0.8 * sz) #80% of the list
shuffled_l = random.shuffle(l)
l2 = shuffled_l[:cut] # first 80% of shuffled list
l3 = shuffled_l[cut:] # last 20% of shuffled list

在Python中使用random.sample（）后如何获取剩余样本？

1 个答案: