通过泰坦尼克号数据库通过超几何采样创建列联表

时间:2018-07-21 21:54:29

标签: r contingency

我用超几何学采样法从泰坦尼克号的乘客数据中创建了一个列联表,这意味着两个边际总数都被预设并等于-。它是在328例-164名男性和164名女性-的“性和幸存者”栏上创建的,代码如下:

首先,我将数据取消分组并删除无用的列

titanic = as.data.frame(Titanic)
titanic = titanic[rep(1:nrow(titanic),titanic$Freq),]
titanic = titanic[,c(2,4)]

后来,选择了一个男人样本

men = subset(titanic, titanic$Sex == 'Male')
men = men [sample(nrow(men),164), ]
table(men$Sex, men$Survived)

#           No Yes
#   Male   133  31
#   Female   0   0

现在必须用适当的值填充一排妇女

n = summary.factor(men$Survived)
womenYes = subset(titanic, (titanic$Sex == 'Female' & titanic$Survived=='Yes'))
womenYes = subset(womenYes[1:n[1], ])
womenNo = subset(titanic, (titanic$Sex == 'Female' & titanic$Survived=='No'))
womenNo = subset(womenNo[1:n[2], ])
women = merge(womenYes, womenNo, all = TRUE)
hyperSample = merge(men, women, all = TRUE)
table(hyperSample$Sex, hyperSample$Survived)

#           No Yes
#   Male   133  31
#   Female  31 133

它可以工作,但是看起来有点丑陋,老实说,也许有人可以找到一种更优雅或更有效的方法。谢谢。

1 个答案:

答案 0 :(得分:0)

您可以分两个阶段进行抽样,都可以使用rhyper:首先,确定仅接受抽样的男女人数328,并假设人口与原始抽样一样是按性别分布的。如果您尝试引导诸如比率之类的统计信息,则可能会这样做。然后,再次使用Rhyper两次,确定原始样本行中具有相同概率的幸存者数量。

 MFmat <- apply(Titanic, c(2, 4), sum)
 nMale <- rhyper(1, rowSums(MFmat)[1], rowSums(MFmat)[2], 328)
#[1] 262
 nFemale <- 328 - nMale
 DMale <- rhyper(1, MFmat[1,1], MFmat[1,2], nMale)
 SurvMale = nMale-DMale
 DFemale = rhyper(1, MFmat[2,1], MFmat[2,2], nFemale)
 SurvFemale = nFemale - DFemale
 matrix( c( DMale, DFemale, SurvMale, SurvFemale), ncol=2, 
dimnames=dimnames(MFmat) )
#----
        Survived
Sex       No Yes
  Male   223  42
  Female  22  41

我想您可以分别对两行进行采样,并且您应该能够使用上面的逻辑,如果您决定这样做。哪种方法更合适将取决于潜在的问题。

# Fixed row marginals....
   nMale <-164
  nFemale <- 164
  DMale <- rhyper(1, MFmat[1,1], MFmat[1,2], nMale)
  SurvMale = nMale-DMale
  DFemale = rhyper(1, MFmat[2,1], MFmat[2,2], nFemale)
  SurvFemale = nFemale - DFemale
  matrix( c( DMale, DFemale, SurvMale, SurvFemale), ncol=2, 
 dimnames=dimnames(MFmat) )
#----------------
        Survived
Sex       No Yes
  Male   127  37
  Female  39 125