在R

时间:2017-10-15 12:51:51

标签: r matrix

原谅我,我对此很陌生。如果有人可以提供帮助或指向我寻求帮助的资源,我将非常感激:

我有一个数据表,包含150,000个300个变量的观察值,一些结果/症状(因变量)和一些输入(自变量)。对于每个症状,我想要描述性统计,以及与每个输入相关联的卡方检验结果。

对于描述性统计,我设法通过创建一个名为“symptom.matrix”的结果变量矩阵并使用“apply”来实现这一目的。

Desc.stats<-matrix(c(apply(symptom.matrix,2,sum),
                     apply(symptom.matrix,2,mean),
                     apply(symptom.matrix,2,function(x)
                           {return(sqrt((mean(x)*(1-mean(x)))/length(x)))})),
                  ncol=3,                                 
                  dimnames=list(c(...),
                  c("N","prev","s.e."))); Desc.stats

为了获得卡方,我使用chisq.test对各个结果和输入对以下列方式,但我看不出如何将其应用于symptom.matrix

 result1<-(chisq.test(symptom1,input1));
print (c(result1$statistic, result1$p.value))

如何扩展此功能以解决症状矩阵问题?是否有可能使用chisq.test,或者我最好回到基础来为自己编写统计函数?

1 个答案:

答案 0 :(得分:0)

考虑嵌套lapply调用,在输入列的每个组合中迭代每个症状并返回嵌套列表。而lapply的输入对象将是原始数据帧中所有症状列和所有输入列的分割。

由于OP未提供实际数据样本,因此下面使用随机数据进行演示:

set.seed(788)
symptoms <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(symptoms) <- c("Vision.Symptom","Voice.Symptom","Delofreference.Symptom","Paranoia.Symptom", 
                        "VisionorVoice.Symptom","Delusion.Symptom","UEAny.Symptom")

set.seed(992)
inputs <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(inputs) <- c("Vision.Input","Voice.Input","Delofreference.Input","Paranoia.Input", 
                      "VisionorVoice.Input","Delusion.Input","UEAny.Input")

df <- data.frame(symptoms, inputs)

# LIST OF 7 ITEMS, EACH NESTED WITH THE 7 INPUTS
# CHANGE grep() to c() OF ACTUAL COLUMN NAMES
chi_sq_list <- lapply(df[grep("\\.Symptom", names(df))], function(s)
                      lapply(df[grep("\\.Input", names(df))], function(i) chisq.test(s,i)))

输出 (第一个列表项)

chi_sq_list$Vision.Symptom

$Vision.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 241.22, df = 240, p-value = 0.4657


$Voice.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 247, df = 240, p-value = 0.3644


$Delofreference.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 289.25, df = 256, p-value = 0.07502


$Paranoia.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 322.11, df = 288, p-value = 0.08131


$VisionorVoice.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 215.22, df = 208, p-value = 0.351


$Delusion.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 218.47, df = 224, p-value = 0.5916


$UEAny.Input

    Pearson's Chi-squared test

data:  s and i
X-squared = 254.22, df = 256, p-value = 0.5196