如何随机分配变量

时间:2014-11-11 03:23:48

标签: variables random stata

在Stata中,我想创建一个新变量,其值与已知分布的概率相关联。

假设分发pdf如下:

Blue - .2
Red - .3
Green - .5

我可以使用以下代码来获得上面的确切分布。首先,有更快的方法来实现这一目标吗?

gen Color = ""
replace Color = "Blue" if _n <= _N*.2
replace Color = "Red" if _n > _N*.2 & _n <= _N*.5
replace Color = "Green" if Color==""

为了模拟随机抽奖,我想我可以做到:

gen rand = runiform()
sort rand
gen Color = ""
replace Color = "Blue" if rand <= .2
replace Color = "Red" if rand > .2 & rand <= .5
replace Color = "Green" if Color==""

这种技术是最佳实践吗?

1 个答案:

答案 0 :(得分:1)

生成数据时,您可以使用效率更高的in代替if。但说实话,我认为数据集必须非常大,以便能够感知时差。你可以做一些实验来检查它。

随机抽奖的第二个问题已经由Bill Gould(StataCorp总裁)撰写的一系列帖子解决。下面的一些代码带有内联注释。你可以运行整个事情并检查结果。

clear
set more off

*----- first question -----

/* create data with certain distribution */

set obs 100
set seed 23956

gen obs = _n
gen rand = runiform()
sort rand

gen Color = ""

/* 
// original
replace Color = "Blue" if _n <= _N*.2
replace Color = "Red" if _n > _N*.2 & _n <= _N*.5
replace Color = "Green" if Color==""
*/

// using -in-
replace Color = "Blue" in 1/`=floor(_N*.2)'
replace Color = "Red" in `=floor(_N*.2) + 1'/`=floor(_N*.5)'
replace Color = "Green" in `=floor(_N*.5) + 1'/L

/* 
// using -cond()-
gen Color = cond(_n <= _N*.2, "Blue", cond(_n > _N*.2 & _n <= _N*.5, "Red", "Green"))
*/

drop rand
sort obs

tempfile allobs
save "`allobs'"

tab Color

*----- second question -----

/* draw without replacement a random sample of 20 
observations from a dataset of N observations */

set seed 89365
sort obs // for reproducibility
generate double u = runiform()
sort u
keep in 1/20

tab obs Color

/* If N>1,000, generate two random variables u1 and u2 
in place of u, and substitute sort u1 u2 for sort u */

/* draw with replacement a random sample of 20 
observations from a dataset of N observations */

clear

set seed 08236
drop _all
set obs 20
generate long obsno = floor(100*runiform()+1)
sort obsno
tempfile obstodraw
save "`obstodraw'"

use "`allobs'", clear
generate long obsno = _n
merge 1:m obsno using "`obstodraw'", keep(match) nogen

tab obs Color

这些和其他细节可以在随机数的四部分系列中找到 发电机,由比尔古尔德:http://blog.stata.com/2012/10/24/using-statas-random-number-generators-part-4-details/

另见help sample