Question

我有一个形状为780x6的pandas.DataFrame，其中3列具有二进制值('treatment', 'married', 'nodegree')和3个浮点数（列3：6）。我想对三个非二进制列执行蒙特卡洛模拟。因此，我首先创建所有可能的索引变体，以后再使用它们来进行MC模拟：

index000 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index 
index001 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index010 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index100 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index
index011 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index
index110 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index101 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index111 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index

然后，我正在计算所有三个非二进制列的均值mean()和协方差cov()：

mean000 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean001 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean010 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean100 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean011 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean110 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean101 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean111 = X_MC.iloc[ index000, 3 : 6 ].mean()
cov000 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov001 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov010 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov100 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov011 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov110 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov101 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov111 = X_MC.iloc[ index000, 3 : 6 ].cov()

从上一步获得的结果用于定义MC仿真的分布：

df_MC.iloc[ index000, 3 : 6 ] = np.random.multivariate_normal( mean000, cov000, len( index000 ) )
df_MC.iloc[ index001, 3 : 6 ] = np.random.multivariate_normal( mean001, cov001, len( index001 ) )
df_MC.iloc[ index010, 3 : 6 ] = np.random.multivariate_normal( mean010, cov010, len( index010 ) )
df_MC.iloc[ index100, 3 : 6 ] = np.random.multivariate_normal( mean100, cov100, len( index100 ) )
df_MC.iloc[ index011, 3 : 6 ] = np.random.multivariate_normal( mean011, cov011, len( index011 ) )
df_MC.iloc[ index110, 3 : 6 ] = np.random.multivariate_normal( mean110, cov110, len( index110 ) )
df_MC.iloc[ index101, 3 : 6 ] = np.random.multivariate_normal( mean101, cov101, len( index101 ) )
df_MC.iloc[ index111, 3 : 6 ] = np.random.multivariate_normal( mean111, cov111, len( index111 ) )

不幸的是，np.random.multivariate_normal仅允许用户定义均值，cov和向量的长度。因此，您无法通过以下方式控制随机生成的值：必须在一定范围内。
但是，我想根据经验值为每列设置一个最小值和最大值。
因此，除了定义的均值，cov和长度之外，分布的值也不应大于例如。 50，小于17。这不仅带来挑战来定义np.random.multivariate_normal的所有值必须位于的范围，而且给每个非二元列分别定义该范围。因此第二列的最大值应为12代替50，最小值2代替17。

有可能实现我的意图？

定义多元正态分布的范围

0 个答案: