我使用saveRDS
保存了一些大矩阵:
# create same big matrix and save it
x = matrix(c(1:(10*10000)),10000,10)
saveRDS(x, 'test.RDS')
现在我想分析一下数据的样本,但在采集样本之前,我一直在阅读完整的矩阵:
# load big matrix and take a sample on the data after reading the data
x <- readRDS('test.RDS')
set.seed(1)
x[sample.int(dim(x)[1],5),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2656 12656 22656 32656 42656 52656 62656 72656 82656 92656
[2,] 3721 13721 23721 33721 43721 53721 63721 73721 83721 93721
[3,] 5728 15728 25728 35728 45728 55728 65728 75728 85728 95728
[4,] 9080 19080 29080 39080 49080 59080 69080 79080 89080 99080
[5,] 2017 12017 22017 32017 42017 52017 62017 72017 82017 92017
我想知道是否可以只读取存储在RDS文件中的数据样本?这意味着在取样之前不会将整个矩阵读入内存,但不知何故会跳过不属于样本的数据?
我尝试了以下内容,得到了相同的结果:
# find out the size of the matrix and load only the part of the matrix which is needed?
n <- dim(readRDS('test.RDS'))[1]
set.seed(1)
readRDS('test.RDS')[sample.int(dim(x)[1],5),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 2656 12656 22656 32656 42656 52656 62656 72656 82656 92656
[2,] 3721 13721 23721 33721 43721 53721 63721 73721 83721 93721
[3,] 5728 15728 25728 35728 45728 55728 65728 75728 85728 95728
[4,] 9080 19080 29080 39080 49080 59080 69080 79080 89080 99080
[5,] 2017 12017 22017 32017 42017 52017 62017 72017 82017 92017
如何在不将完整数据暂时存入内存的情况下读取RDS文件上的示例?
或者,什么样的存储&amp;应该使用加载函数,以便只能从包含矩阵或数据框的文件中读取样本?