定义多元正态分布的范围

时间:2019-12-13 12:54:48

标签: python numpy distribution montecarlo


我有一个形状为780x6的pandas.DataFrame,其中3列具有二进制值('treatment', 'married', 'nodegree')和3个浮点数(列3:6)。 我想对三个非二进制列执行蒙特卡洛模拟。 因此,我首先创建所有可能的索引变体,以后再使用它们来进行MC模拟:

index000 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index 
index001 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index010 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index100 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index
index011 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index
index110 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index101 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index111 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index

然后,我正在计算所有三个非二进制列的均值mean()和协方差cov()

mean000 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean001 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean010 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean100 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean011 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean110 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean101 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean111 = X_MC.iloc[ index000, 3 : 6 ].mean()
cov000 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov001 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov010 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov100 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov011 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov110 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov101 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov111 = X_MC.iloc[ index000, 3 : 6 ].cov()

从上一步获得的结果用于定义MC仿真的分布:

df_MC.iloc[ index000, 3 : 6 ] = np.random.multivariate_normal( mean000, cov000, len( index000 ) )
df_MC.iloc[ index001, 3 : 6 ] = np.random.multivariate_normal( mean001, cov001, len( index001 ) )
df_MC.iloc[ index010, 3 : 6 ] = np.random.multivariate_normal( mean010, cov010, len( index010 ) )
df_MC.iloc[ index100, 3 : 6 ] = np.random.multivariate_normal( mean100, cov100, len( index100 ) )
df_MC.iloc[ index011, 3 : 6 ] = np.random.multivariate_normal( mean011, cov011, len( index011 ) )
df_MC.iloc[ index110, 3 : 6 ] = np.random.multivariate_normal( mean110, cov110, len( index110 ) )
df_MC.iloc[ index101, 3 : 6 ] = np.random.multivariate_normal( mean101, cov101, len( index101 ) )
df_MC.iloc[ index111, 3 : 6 ] = np.random.multivariate_normal( mean111, cov111, len( index111 ) )

不幸的是,np.random.multivariate_normal仅允许用户定义均值,cov和向量的长度。因此,您无法通过以下方式控制随机生成的值:必须在一定范围内。
但是,我想根据经验值为每列设置一个最小值和最大值。
因此,除了定义的均值,cov和长度之外,分布的值也不应大于例如。 50,小于17。 这不仅带来挑战来定义np.random.multivariate_normal的所有值必须位于的范围,而且给每个非二元列分别定义该范围。因此第二列的最大值应为12代替50,最小值2代替17。


有可能实现我的意图?

0 个答案:

没有答案