如何表示二元t统计量?

时间:2017-03-07 02:28:46

标签: r t-test

问题是这样的:

阅读文件diabetes.csv。有两个变量称为BMI和结果。变量Outcome仅取两个值:0和1.对两个结果值的BMI标准差相同的假设进行非参数双样本检验

https

数据在一些名为"糖尿病"的数据集中。我的问题是如何表示" t-statistic"既然结果是二进制的?

1 个答案:

答案 0 :(得分:0)

使用此代码:

# Sort the table diabetes on accending order of Outcome to separate the BMI 
# values with outcome = 0 and BMI values with outcome = 1

diabetes = diabetes[order(diabetes$Outcome),]   
View(diabetes)

# Find the number of values with outcome = 0

n = length(which(diabetes$Outcome == 0)) 

# Find total number of rows 

l = length(diabetes$BMI)               

# Find BMI values to create the sample later on

g = diabetes$BMI                           

# Create function to take the values of BMI and shuffle it every time and
# to find the difference between the standard deviations

f1 = function()
{
  x = sample(g)             
  z = abs(sd(x[1:n]) - sd(x[(n+1):l]))
  return(z)
}

# Replicate the function several times

dist = replicate(100000,f1())          

# Plot density of distribution

plot(density(dist))                    

polygon(density(dist),col="green")


diabetes0 = diabetes[diabetes$Outcome == 0,]
diabetes1 = diabetes[diabetes$Outcome == 1,]

View(diabetes0)
View(diabetes1)

# Find the difference between standard deviation of BMI when outcome = 0 and 
# when outcome = 1

tstat = abs(sd(diabetes0$BMI) - sd(diabetes1$BMI))       

tstat

abline(v=tstat)                                           
rside = dist[dist>tstat]    


pvalue = length(rside)/length(dist)
pvalue