Question

我使用以下方法计算我的数据集的相关性：

cor( var1, var2, method = "method")

但我喜欢创建一个包含4个不同变量的相关矩阵。最简单的方法是什么？

Answer 1

在数据框上使用相同的函数（cor），例如：

> cor(VADeaths)
             Rural Male Rural Female Urban Male Urban Female
Rural Male    1.0000000    0.9979869  0.9841907    0.9934646
Rural Female  0.9979869    1.0000000  0.9739053    0.9867310
Urban Male    0.9841907    0.9739053  1.0000000    0.9918262
Urban Female  0.9934646    0.9867310  0.9918262    1.0000000

或者，在一个也包含离散变量的数据框中（有时也称为因子），尝试以下内容：

> cor(mtcars[,unlist(lapply(mtcars, is.numeric))])
            mpg        cyl       disp         hp        drat         wt        qsec         vs          am       gear        carb
mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594  0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958 -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799 -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479 -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000 -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157  0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953 -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870 -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059 -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000

Answer 2

如果您想将矩阵与我可以推荐的一些可视化结合起来（我使用内置的library(psych) pairs.panels(iris[1:4]) # select columns 1-4数据集）：

library(PerformanceAnalytics)
chart.Correlation(iris[1:4])

性能分析基本上做同样的事情，但包括显着性指标。

library(corrplot)
x <- cor(iris[1:4])
corrplot(x, type="upper", order="hclust")

或者这个简单明了的可视化：

exec("node myWritingChild.js");

Answer 3

请参阅corr.test包中的psych功能：

> corr.test(mtcars[1:4])
Call:corr.test(x = mtcars[1:4])
Correlation matrix 
       mpg   cyl  disp    hp
mpg   1.00 -0.85 -0.85 -0.78
cyl  -0.85  1.00  0.90  0.83
disp -0.85  0.90  1.00  0.79
hp   -0.78  0.83  0.79  1.00
Sample Size 
     mpg cyl disp hp
mpg   32  32   32 32
cyl   32  32   32 32
disp  32  32   32 32
hp    32  32   32 32
Probability value 
     mpg cyl disp hp
mpg    0   0    0  0
cyl    0   0    0  0
disp   0   0    0  0
hp     0   0    0  0

另一个无耻的自我广告：https://gist.github.com/887249

Answer 4

你可能想看看Quick-R，它有很多很好的小教程，介绍如何在R中进行基本统计。例如关于相关性：

http://www.statmethods.net/stats/correlations.html

Answer 5

您还可以计算所有变量的相关性，但不包括所选变量，例如：

mtcars <- data.frame(mtcars)
# here we exclude gear and carb variables
cors <- cor(subset(mtcars, select = c(-gear,-carb)))

此外，要计算每个变量与一列之间的相关性，您可以使用sapply()

# sapply effectively calls the corelation function for each column of mtcars and mtcars$mpg
cors2 <- sapply(mtcars, cor, y=mtcars$mpg)

计算两个以上变量的相关性？

5 个答案: