我用超几何学采样法从泰坦尼克号的乘客数据中创建了一个列联表,这意味着两个边际总数都被预设并等于-。它是在328例-164名男性和164名女性-的“性和幸存者”栏上创建的,代码如下:
首先,我将数据取消分组并删除无用的列
titanic = as.data.frame(Titanic)
titanic = titanic[rep(1:nrow(titanic),titanic$Freq),]
titanic = titanic[,c(2,4)]
后来,选择了一个男人样本
men = subset(titanic, titanic$Sex == 'Male')
men = men [sample(nrow(men),164), ]
table(men$Sex, men$Survived)
# No Yes
# Male 133 31
# Female 0 0
现在必须用适当的值填充一排妇女
n = summary.factor(men$Survived)
womenYes = subset(titanic, (titanic$Sex == 'Female' & titanic$Survived=='Yes'))
womenYes = subset(womenYes[1:n[1], ])
womenNo = subset(titanic, (titanic$Sex == 'Female' & titanic$Survived=='No'))
womenNo = subset(womenNo[1:n[2], ])
women = merge(womenYes, womenNo, all = TRUE)
hyperSample = merge(men, women, all = TRUE)
table(hyperSample$Sex, hyperSample$Survived)
# No Yes
# Male 133 31
# Female 31 133
它可以工作,但是看起来有点丑陋,老实说,也许有人可以找到一种更优雅或更有效的方法。谢谢。
答案 0 :(得分:0)
您可以分两个阶段进行抽样,都可以使用rhyper
:首先,确定仅接受抽样的男女人数328,并假设人口与原始抽样一样是按性别分布的。如果您尝试引导诸如比率之类的统计信息,则可能会这样做。然后,再次使用Rhyper两次,确定原始样本行中具有相同概率的幸存者数量。
MFmat <- apply(Titanic, c(2, 4), sum)
nMale <- rhyper(1, rowSums(MFmat)[1], rowSums(MFmat)[2], 328)
#[1] 262
nFemale <- 328 - nMale
DMale <- rhyper(1, MFmat[1,1], MFmat[1,2], nMale)
SurvMale = nMale-DMale
DFemale = rhyper(1, MFmat[2,1], MFmat[2,2], nFemale)
SurvFemale = nFemale - DFemale
matrix( c( DMale, DFemale, SurvMale, SurvFemale), ncol=2,
dimnames=dimnames(MFmat) )
#----
Survived
Sex No Yes
Male 223 42
Female 22 41
我想您可以分别对两行进行采样,并且您应该能够使用上面的逻辑,如果您决定这样做。哪种方法更合适将取决于潜在的问题。
# Fixed row marginals....
nMale <-164
nFemale <- 164
DMale <- rhyper(1, MFmat[1,1], MFmat[1,2], nMale)
SurvMale = nMale-DMale
DFemale = rhyper(1, MFmat[2,1], MFmat[2,2], nFemale)
SurvFemale = nFemale - DFemale
matrix( c( DMale, DFemale, SurvMale, SurvFemale), ncol=2,
dimnames=dimnames(MFmat) )
#----------------
Survived
Sex No Yes
Male 127 37
Female 39 125