Question

我有一个治疗和控制列表，10％的用户在控制中，90％在测试中休息。

control=range(1,11)#create sample control ids
test=range(11,101)#create sample test ids

现在我想用以下规则随机生成控制和测试对：

每个控件只能映射到2个处理而不重复。这意味着我们的输出应该如下：

（1,31），（1,39），（2,26），（2,81）.....

因此，一旦测试31与对照1匹配，它就不能与任何其他对照匹配。

其次，在上述情况下，我只采取了两种治疗方法。但我希望它作为参数传递，以便可以匹配任意数量的治疗。

Answer 1

import numpy as np

control = np.array(range(1, 11))  # create sample control ids
test = np.array(range(11, 101))  # create sample test ids


def assign_treatments(treatment_per_control , control_ids, test_ids):
    control_treatment_pairs = []
    for control_id in control_ids:
        random_indices = np.random.choice(len(test_ids), treatment_per_control, replace=False)
        treatment_ids = test_ids[random_indices]
        test_ids = np.delete(test_ids, random_indices)
        for treatment_id in treatment_ids:
             control_treatment_pairs.append((control_id, treatment_id))
    return control_treatment_pairs

control_treatment_pairs = assign_treatments(treatment_per_control=2, control_ids=control, test_ids=test)
for pair in control_treatment_pairs:
    print(pair)

示例运行中的哪些输出：

(1, 73)
(1, 44)
(2, 50)
(2, 77)
(3, 51)
(3, 17)
(4, 93)
(4, 42)
(5, 45)
(5, 82)
(6, 55)
(6, 81)
(7, 91)
(7, 76)
(8, 71)
(8, 70)
(9, 84)
(9, 11)
(10, 43)
(10, 23)

如果您对numpy有任何经验，可以尝试以上解决方案。它基本上是从测试数组中取样而无需替换，因此可以保证每个控件ID都获得唯一的测试ID。在每次迭代中，将从测试数组中删除所选的测试ID。但无论如何，当control_id_count*treatment_per_control超过test_id_count时，您应该小心处理这种情况。此解决方案假定treatment_per_control对于每个控件ID保持不变。

没有测试重复的随机匹配对

1 个答案: