Question

我遇到过很多情况，我想要绘制比我真正应该得到的更多的点数 - 主要是因为当我与人分享我的情节或将它们嵌入文件时，它们占据了太多的空间。在数据帧中随机抽样行非常简单。

如果我想要一个真正随机的点样图样本，很容易说：

ggplot(x,y,data=myDf[sample(1:nrow(myDf),1000),])

然而，我想知道是否有更有效（理想的罐装）方式来指定绘图点的数量，以便您的实际数据准确地反映在绘图中。所以这是一个例子。假设我正在绘制像重尾分布的CCDF那样的东西，例如

ccdf <- function(myList,density=FALSE)
{
  # generates the CCDF of a list or vector
  freqs = table(myList)
  X = rev(as.numeric(names(freqs)))
  Y =cumsum(rev(as.list(freqs)));
  data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(rlnorm(10000,3,2.4)),log='xy')

这将产生一个x＆amp; y轴变得越来越密集。在这里，对于较大的x或y值绘制较少的样本是理想的。

是否有人对处理类似问题有任何提示或建议？

谢谢， -e

Answer 1

在这种情况下，我倾向于使用png文件而不是基于矢量的图形，例如pdf或eps。虽然你失去了分辨率，但文件要小得多。

如果它是一个更传统的散点图，那么使用半透明颜色也有助于解决过度绘图问题。例如，

x <- rnorm(10000); y <- rnorm(10000)
qplot(x, y, colour=I(alpha("blue",1/25)))

Answer 2

除了Rob的建议之外，我喜欢的一个情节函数，因为它为你做'变薄'是hexbin;一个例子是at the R Graph Gallery。

Answer 3

如果是对数变换，这是一个关于x轴下采样的可能解决方案。它记录转换x轴，舍入该数量，并选择该bin中的中值x值：

downsampled_qplot <- function(x,y,data,rounding=0, ...) {
  # assumes we are doing log=xy or log=x
  group = factor(round(log(data$x),rounding))
  d <- do.call(rbind, by(data, group, 
    function(X) X[order(X$x)[floor(length(X)/2)],]))
  qplot(x,count,data=d, ...)
}

使用上面ccdf()的定义，我们可以将分布的CCDF原始图与下采样版本进行比较：

myccdf=ccdf(rlnorm(10000,3,2.4))

qplot(x,count,data=myccdf,log='xy',main='original')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0')

在PDF格式中，原始绘图占用640K，而下采样版本分别占用20K和8K。

Answer 4

我要么将图像文件（png或jpeg设备）设为Rob已经提到过，要么制作2D histogram.替代2D直方图的是smoothed scatterplot，它制作了一个类似的图形，但是从密集到稀疏的空间区域有更平滑的截止。

如果您以前从未见过addictedtor，那值得一看。它在R中生成了一些非常漂亮的图形，带有图像和示例代码。

以下是addictedtor网站的示例代码：

2-d柱状图：

require(gplots) 

# example data, bivariate normal, no correlation
x <- rnorm(2000, sd=4) 
y <- rnorm(2000, sd=1) 

# separate scales for each axis, this looks circular
hist2d(x,y, nbins=50, col = c("white",heat.colors(16))) 
rug(x,side=1) 
rug(y,side=2) 
box()

smoothscatter：

library("geneplotter")  ## from BioConductor
require("RColorBrewer") ## from CRAN

x1  <- matrix(rnorm(1e4), ncol=2)
x2  <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2)
x   <- rbind(x1,x2)

layout(matrix(1:4, ncol=2, byrow=TRUE))
op <- par(mar=rep(2,4))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
              colramp=colorRampPalette(brewer.pal(9,"YlOrRd")),
              bandwidth=40)
colors  <- densCols(x)
plot(x, col=colors, pch=20)

par(op)

R中的最大绘图点？

4 个答案: