Question

我使用saveRDS保存了一些大矩阵：

# create same big matrix and save it
x = matrix(c(1:(10*10000)),10000,10)
saveRDS(x, 'test.RDS')

现在我想分析一下数据的样本，但在采集样本之前，我一直在阅读完整的矩阵：

# load big matrix and take a sample on the data after reading the data
x <- readRDS('test.RDS')
set.seed(1)
x[sample.int(dim(x)[1],5),]

     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
[1,] 2656 12656 22656 32656 42656 52656 62656 72656 82656 92656
[2,] 3721 13721 23721 33721 43721 53721 63721 73721 83721 93721
[3,] 5728 15728 25728 35728 45728 55728 65728 75728 85728 95728
[4,] 9080 19080 29080 39080 49080 59080 69080 79080 89080 99080
[5,] 2017 12017 22017 32017 42017 52017 62017 72017 82017 92017

我想知道是否可以只读取存储在RDS文件中的数据样本？这意味着在取样之前不会将整个矩阵读入内存，但不知何故会跳过不属于样本的数据？

我尝试了以下内容，得到了相同的结果：

# find out the size of the matrix and load only the part of the matrix which is needed?
n <- dim(readRDS('test.RDS'))[1]
set.seed(1)
readRDS('test.RDS')[sample.int(dim(x)[1],5),]

     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
[1,] 2656 12656 22656 32656 42656 52656 62656 72656 82656 92656
[2,] 3721 13721 23721 33721 43721 53721 63721 73721 83721 93721
[3,] 5728 15728 25728 35728 45728 55728 65728 75728 85728 95728
[4,] 9080 19080 29080 39080 49080 59080 69080 79080 89080 99080
[5,] 2017 12017 22017 32017 42017 52017 62017 72017 82017 92017

如何在不将完整数据暂时存入内存的情况下读取RDS文件上的示例？

或者，什么样的存储＆amp;应该使用加载函数，以便只能从包含矩阵或数据框的文件中读取样本？

是否可以使用readRDS只读取数据样本？

0 个答案: