如何在python中随机分配治疗组?

时间:2021-06-09 01:45:47

标签: python pandas

在我的研究中,我采用了基于回归的差异中差异规范。为了进行安慰剂测试,我尝试根据均匀分布将安慰剂治疗的进入年份随机分配给所有治疗组。比如我的原始数据是这样的

import pytest


Traceback (most recent call last):
  File "C:/Users/user/AppData/Roaming/JetBrains/PyCharmCE2021.1/scratches/scratch_1.py", line 1, in <module>
    import pytest
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pytest\__init__.py", line 7, in <module>
    from _pytest.assertion import register_assert_rewrite
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\_pytest\assertion\__init__.py", line 10, in <module>
    from _pytest.assertion import rewrite
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\_pytest\assertion\rewrite.py", line 26, in <module>
    import py
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\py\__init__.py", line 14, in <module>
    from py._vendored_packages import apipkg
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\py\_vendored_packages\apipkg\__init__.py", line 12, in <module>
    from .version import version as __version__
ValueError: source code string cannot contain null bytes

并且我想根据 1996 年至 2007 年的均匀分布将处理年份随机分配给所有处理组。例如,

treatment_group_dummy treated_year group_number
 1                     1996            1
 1                     2005            3
 1                     2001            5
 1                     2006            5
 1                     2007            5
 1                     2002            5

这是我的初步代码,但我认为它根本不起作用...

treatment_group_dummy treated_year group_number
 1                     2007            1
 1                     1996            3
 1                     2004            5
 1                     2005            5
 1                     2001            5
 1                     2006            5

有人对此有所了解吗?

提前致谢

1 个答案:

答案 0 :(得分:1)

我在您的代码中没有看到 numMembers 的任何初始化。所以我不确定你想要的列表的大小。但以下是一个可能的实现

import numpy as np
import pandas as pd

# set a random seed
np.random.seed(2021)
numGroups = 5

# number of rows in the dataset
size = 10
data = {
    'group': np.random.randint(1, numGroups+1, size),
    'years': np.random.randint(1996, 2008, size)
}
df = pd.DataFrame(data)

编辑 1:根据作者的额外解释,当我们只想随机化treatment_year

df['treated_year'] = np.random.randint(1996, 2008, df.shape[0])