有没有办法减少R中矢量所需的内存?

时间:2016-10-12 23:30:07

标签: r memory out-of-memory

这是我之前发布的帖子here

我在R.工作。

总之,我的载体是巨大的(13gb),但它们不应该是。原始csv文件只是该大小的一小部分。可以想象,13gb比我的机器有更多的内存,更不用说分配给R的内容了。

我目前使用的代码是:

data1<-read.csv("stackexample.csv") ##read in dummy data
data1C<- data1[,3:13] #cut off the ends
SvDvDis<-data1C[c(-3,-4,-6,-7,-9,-10,-11)] #drop individual columns
attach(ScDcDis) #attach for simplicity sake
sm.ancova(s,dt,dip,model="none") #non-parametric ANCOVA

可以在my dropbox上找到虚拟数据文件。

有没有办法减少此功能正在使用的内存,或者是否存在以较少内存密集的方式执行相同分析(非参数ANCOVA)的替代编码/功能?要清楚,不要询问统计数据。我以更有效的方式询问如何做到这一点。

1 个答案:

答案 0 :(得分:0)

这是我的建议,它在我简陋的笔记本电脑上运行良好。您可以通过平均值测试对其进行补充,以确保样本充分反映人口。

data1   <- read.csv("stackexample.csv") ##read in dummy data

library(dplyr)
library(sm)

data2 <- sample_n(data1, 10000) # make statistics work for you -- sample the data
sm.ancova(x     = data2$s,
          y     = data2$dt,
          group = data2$dip,
          model = "none") #non-parametric ANCOVA

enter image description here

即使只有1,000个样本,我也没有发现平均值有任何显着差异。

t.test(data1$s, data2$s)
  Welch Two Sample t-test

data:  data1$s and data2$s
t = -1.4469, df = 1017.9, p-value = 0.1482
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -37.657822   5.692622
sample estimates:
mean of x mean of y 
 125.3137  141.2963

样本为5,000:

data2 <- sample_n(data1, 5000) # make statistics work for you -- sample the data
t.test(data1$s, data2$s)
  Welch Two Sample t-test

data:  data1$s and data2$s
t = -1.0653, df = 5513.7, p-value = 0.2868
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.736700   4.359704
sample estimates:
mean of x mean of y 
 125.3137  130.5022
t.test(data1$dt, data2$dt)
  Welch Two Sample t-test

data:  data1$dt and data2$dt
t = -0.069479, df = 5507.8, p-value = 0.9446
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -18.39645  17.13709
sample estimates:
mean of x mean of y 
 515.6206  516.2503
t.test(data1$dip, data2$dip)
  Welch Two Sample t-test

data:  data1$dip and data2$dip
t = 1.2044, df = 5536.3, p-value = 0.2285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6268062  2.6241395
sample estimates:
mean of x mean of y 
 126.6667  125.6680

当然,您可以使用更多/不同的统计信息来验证您的样本,具体取决于您想要的距离。您还可以预先估算功率曲线以确定样本大小。

样本为10,000,我的笔记本电脑上花了大约3分钟。 1000个样本立即完成。