我想知道我的变量之间是否存在相关性。这是数据集
的结构[1] 4 8
正如您所看到的,存在连续和分类变量。
当我运行'data.frame': 189 obs. of 20 variables:
$ age : num 24 31 32 35 36 26 31 24 35 36 ...
$ diplM2 : Factor w/ 3 levels "0","1","2": 3 2 1 3 2 2 3 2 2 1 ...
$ TimeDelcat : Factor w/ 4 levels "0","1","2","3": 1 1 3 3 3 4 2 1 4 4 ...
$ SeasonDel : Factor w/ 4 levels "1","2","3","4": 1 2 4 3 4 3 4 3 2 3 ...
$ BMIM2 : num 23.4 25.7 17 26.6 24.6 21.6 21 22.3 20.8 20.7 ...
$ WgtB2 : int 3740 3615 3705 3485 3420 2775 3365 3770 3075 3000 ...
$ sex : Factor w/ 2 levels "1","2": 2 2 1 2 2 2 1 1 1 1 ...
$ smoke : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 1 1 3 ...
$ nRBC : num 0.1621 0.0604 0.1935 0.0527 0.1118 ...
$ CD4T : num 0.1427 0.2143 0.1432 0.0686 0.0979 ...
$ CD8T : num 0.1574 0.1549 0.1243 0.0804 0.0782 ...
$ NK : num 0.02817 0 0.04368 0.00641 0.02398 ...
$ Bcell : num 0.1033 0.1124 0.1468 0.0551 0.0696 ...
$ Mono : num 0.0633 0.0641 0.0773 0.0531 0.0656 ...
$ Gran : num 0.428 0.442 0.329 0.716 0.6 ...
$ chip : Factor w/ 92 levels "200251580021",..: 12 24 23 2 27 22 6 22 17 22 ...
$ pos : Factor w/ 12 levels "R01C01","R01C02",..: 11 12 1 6 9 2 12 1 7 11 ...
$ trim1PM25ifdmv4: num 9.45 13.81 15.59 7.13 15.43 ...
$ trim2PM25ifdmv4: num 13.27 15.53 10.69 13.56 9.27 ...
$ trim3PM25ifdmv4: num 16.72 16.21 12.17 6.47 10.66 ...
我收到此错误:
chart.Correlation(variables, histrogram=T,method = c("pearson") )
我该如何解决这个问题? 谢谢。
答案 0 :(得分:1)
我相信你只想在数值变量之间进行相关。下面的代码将执行此操作,它将仅输出输入之间的唯一关联。
library(reshape2)
data <- data.frame(x1=rnorm(10),
x2=rnorm(10),
x3=rnorm(10),
x4=c("a","b","c","d","e","f","g","h","i","j"),
x5=c("ab","sp","sp","dd","hg","hj","qw","dh","ko","jk"))
data
x1 x2 x3 x4 x5
1 -1.2169793 0.5397598 0.4981513 a ab
2 -0.7032631 -2.1262837 -1.0377371 b sp
3 0.8766831 -0.2326975 -0.1219613 c sp
4 0.3405332 2.4766225 -1.1960618 d dd
5 0.1889945 0.3444534 1.9659062 e hg
6 0.8086956 0.4654644 -1.2526696 f hj
7 -0.6850181 -1.7657241 0.5156620 g qw
8 0.8518034 0.9484547 1.4784063 h dh
9 0.5191793 1.2246566 1.3867829 i ko
10 0.4568953 -0.6881464 0.3548839 j jk
#finding correlation for all numerical values
corr=cor(data[as.numeric(which(sapply(data,class)=="numeric"))])
#convert the correlation table to long format
res=melt(corr)
##keeping only one side of the correlations
res$type=apply(res,1,function(x)
paste(sort(c(as.character(x[1]),as.character(x[2]))),collapse="*"))
res=unique(res[,c("type","value")])
res
type value
x1*x1 1.00000000
x1*x2 0.44024939
x1*x3 0.04936654
x2*x2 1.00000000
x2*x3 0.08859169
x3*x3 1.00000000