原谅我,我对此很陌生。如果有人可以提供帮助或指向我寻求帮助的资源,我将非常感激:
我有一个数据表,包含150,000个300个变量的观察值,一些结果/症状(因变量)和一些输入(自变量)。对于每个症状,我想要描述性统计,以及与每个输入相关联的卡方检验结果。
对于描述性统计,我设法通过创建一个名为“symptom.matrix”的结果变量矩阵并使用“apply”来实现这一目的。
Desc.stats<-matrix(c(apply(symptom.matrix,2,sum),
apply(symptom.matrix,2,mean),
apply(symptom.matrix,2,function(x)
{return(sqrt((mean(x)*(1-mean(x)))/length(x)))})),
ncol=3,
dimnames=list(c(...),
c("N","prev","s.e."))); Desc.stats
为了获得卡方,我使用chisq.test对各个结果和输入对以下列方式,但我看不出如何将其应用于symptom.matrix
result1<-(chisq.test(symptom1,input1));
print (c(result1$statistic, result1$p.value))
如何扩展此功能以解决症状矩阵问题?是否有可能使用chisq.test,或者我最好回到基础来为自己编写统计函数?
答案 0 :(得分:0)
考虑嵌套lapply
调用,在输入列的每个组合中迭代每个症状并返回嵌套列表。而lapply
的输入对象将是原始数据帧中所有症状列和所有输入列的分割。
由于OP未提供实际数据样本,因此下面使用随机数据进行演示:
set.seed(788)
symptoms <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(symptoms) <- c("Vision.Symptom","Voice.Symptom","Delofreference.Symptom","Paranoia.Symptom",
"VisionorVoice.Symptom","Delusion.Symptom","UEAny.Symptom")
set.seed(992)
inputs <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(inputs) <- c("Vision.Input","Voice.Input","Delofreference.Input","Paranoia.Input",
"VisionorVoice.Input","Delusion.Input","UEAny.Input")
df <- data.frame(symptoms, inputs)
# LIST OF 7 ITEMS, EACH NESTED WITH THE 7 INPUTS
# CHANGE grep() to c() OF ACTUAL COLUMN NAMES
chi_sq_list <- lapply(df[grep("\\.Symptom", names(df))], function(s)
lapply(df[grep("\\.Input", names(df))], function(i) chisq.test(s,i)))
输出 (第一个列表项)
chi_sq_list$Vision.Symptom
$Vision.Input
Pearson's Chi-squared test
data: s and i
X-squared = 241.22, df = 240, p-value = 0.4657
$Voice.Input
Pearson's Chi-squared test
data: s and i
X-squared = 247, df = 240, p-value = 0.3644
$Delofreference.Input
Pearson's Chi-squared test
data: s and i
X-squared = 289.25, df = 256, p-value = 0.07502
$Paranoia.Input
Pearson's Chi-squared test
data: s and i
X-squared = 322.11, df = 288, p-value = 0.08131
$VisionorVoice.Input
Pearson's Chi-squared test
data: s and i
X-squared = 215.22, df = 208, p-value = 0.351
$Delusion.Input
Pearson's Chi-squared test
data: s and i
X-squared = 218.47, df = 224, p-value = 0.5916
$UEAny.Input
Pearson's Chi-squared test
data: s and i
X-squared = 254.22, df = 256, p-value = 0.5196