Question

我试图找出为什么我保存的某些数组为.rda似乎比其他大小相同的人占用更多内存。下面是两个对象x和y，它们具有相同的大小，类型和尺寸。当我保存每一个时，一个是41 Mb而另一个是6 Mb。任何人都可以想到这可能发生的原因吗？

> dim(x)
[1]    71    14 10000
> dim(y)
[1]    71    14 10000 
> class(x)
[1] "array"
> class(y)
[1] "array"  
> object.size(y)
79520208 bytes
> object.size(x)
79520208 bytes

Answer 1

它们都可以是字符数组，或列表或数据帧。或者一个可能是字符（一个或两个字节将是最小元素大小，并且otehr可以是数字（每个元素8个字节）或者更大的可以具有大字符元素.....或者各种其他可能性。我得到与您的结果有些相同：

x <- array(runif( 71* 14 *10000), dim = c(71 ,   14, 10000) )
 save(x, file="test.rda")
 object.size(x)
# 79520208 bytes  and the file is over 50 MB
x <- array(sample(letters, 71* 14 *10000, replace=TRUE), dim = c(71 ,   14, 10000) )
 save(x, file="test2.rda")
 object.size(x)
# 79521456 bytes   and the file is around 8 MB

Answer 2

如果使用save或saveRDS命令进行保存，则默认使用压缩。如果向量中有不同的内容，它们会以不同的方式压缩......

使用save尝试compress=FALSE并再次进行比较......

在下面的示例中，文件大小差异达到700倍：

set.seed(42)
x <- runif(1e6)  # random values should not compress well...
y <- rep(0, 1e6) # zeroes should compress very well...
object.size(x) # 8000040 bytes
object.size(y) # 8000040 bytes

save('x', file='x.rds')
save('y', file='y.rds')
file.info(c('x.rds', 'y.rds'))$size
#[1] 5316773    7838

save('x', file='x.rds', compress=FALSE)
save('y', file='y.rds', compress=FALSE)
file.info(c('x.rds', 'y.rds'))$size
#[1] 8000048 8000048

对象大小差异

2 个答案: