我正在尝试创建一个包含16个不同向量的相关矩阵,几乎所有内容都是我想要的,唯一的区别是我宁愿在散点图中使用最佳拟合线而不是平滑曲线。我已经看到一些其他的帖子提到使用函数调用的pair()部分来改变pch.Correlation,是否有类似的东西要求最合适的线而不是平滑的曲线?
我问,因为我觉得平滑的曲线可能会在散点图的某些部分给出错误的高度相关性,我知道相关性就在上半部分,但我仍然希望能够选择将散点图中的线从平滑更改为最佳拟合。
我的代码非常简单:
chart.Correlation(all.cell.types.rna.seq.table[,2:16], histogram=FALSE)
all.cell.types.rna.seq.table是一个包含16列的数据框,第一列是一个id号。
相关矩阵,平滑线而不是最佳拟合线:
在相关矩阵图像的下三角形的散点图中,我想要的是最佳拟合线而不是平滑曲线。
答案 0 :(得分:0)
我一直在寻找完全一样的东西……并且可能只使用基本功能pairs()
,如图here所示。这是使用数据集mtcars的示例:
reg <- function(x, y, ...) {
points(x,y, ...)
abline(lm(y~x), col = "red")
}
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- cor(x, y) # was abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 2 # or 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor) # was cex.cor * abs(r))
}
panel.hist <- function(x, ...) {
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
pairs(mtcars[, c(1,3,4,5,6,7)], lower.panel = reg, upper.panel = panel.cor, diag.panel = panel.hist)
但是它不如chart.Correlation()
好,并且由于我无法弄清楚如何将reg
函数传递到chart.Correlation()
中,我研究了{{3} }并通过直接在函数lower.panel = panel.smooth ==> lower.panel = reg
内直接对其进行更改来解决。所以这是mtcars的最后一个例子:
chart.Correlation.linear <-
function (R, histogram = TRUE, method=c("pearson", "kendall", "spearman"), ...)
{ # @author R Development Core Team
# @author modified by Peter Carl & Marek Lahoda
# Visualization of a Correlation Matrix. On top the (absolute) value of the correlation plus the result
# of the cor.test as stars. On botttom, the bivariate scatterplots, with a linear regression fit.
# On diagonal, the histograms with probability, density and normal density (gaussian) distribution.
x = checkData(R, method="matrix")
if(missing(method)) method=method[1] #only use one
cormeth <- method
# Published at http://addictedtor.free.fr/graphiques/sources/source_137.R
panel.cor <- function(x, y, digits=2, prefix="", use="pairwise.complete.obs", method=cormeth, cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- cor(x, y, use=use, method=method) # MG: remove abs here
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
test <- cor.test(as.numeric(x),as.numeric(y), method=method)
# borrowed from printCoefmat
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
symbols = c("***", "**", "*", ".", " "))
# MG: add abs here and also include a 30% buffer for small numbers
text(0.5, 0.5, txt, cex = cex * (abs(r) + .3) / 1.3)
text(.8, .8, Signif, cex=cex, col=2)
}
#remove method from dotargs
dotargs <- list(...)
dotargs$method <- NULL
rm(method)
hist.panel = function (x, ...=NULL ) {
par(new = TRUE)
hist(x,
col = "light gray",
probability = TRUE,
axes = FALSE,
main = "",
breaks = "FD")
lines(density(x, na.rm=TRUE),
col = "red",
lwd = 1)
# adding line representing density of normal distribution with parameters correponding to estimates of mean and standard deviation from the data
ax.x = seq(min(x), max(x), 0.1) # ax.x containts points corresponding to data range on x axis
density.est = dnorm(ax.x, mean = mean(x), sd = sd(x)) # density corresponding to points stored in vector ax.x
lines(ax.x, density.est, col = "blue", lwd = 1, lty = 1) # adding line representing density into histogram
rug(x)
}
# Linear regression line fit over points
reg <- function(x, y, ...) {
points(x,y, ...)
abline(lm(y~x), col = "red")
}
# Draw the chart
if(histogram)
pairs(x, gap=0, lower.panel=reg, upper.panel=panel.cor, diag.panel=hist.panel)
else
pairs(x, gap=0, lower.panel=reg, upper.panel=panel.cor)
}
chart.Correlation.linear(mtcars[, c(1,3,4,5,6,7)], histogram = TRUE)