Question

我正在尝试使用Rie中的scan（）函数以矩阵形式逐步读取大文件（ hap_file ）

x=  matrix(scan(hap_file, what = "character",quiet = TRUE, nlines=2500000))
y=  matrix(scan(hap_file, what = "character",quiet = TRUE, skip=2500000, nlines=2500000))
z=  matrix(scan(hap_file, what = "character",quiet = TRUE, skip=5000000, nlines=2500000))

hap_file 具有1006行和7500000列，并且仅包含0、1、2（制表符分隔）。当我尝试使用scan（）整体读取hap_file时，它给我“ 太多项目” 错误。因此，我选择使用scan（）进行部分阅读。我做对了吗？

然后使用rbind函数对行进行矩阵组合：

tmp_haplos =  matrix(rbind(x, y, z),nrow = tmp.nhap)

但我收到一条错误消息：

rbind（x，y，z）中的错误：矩阵的负范围

此错误是什么意思，我该如何解决此问题？

Answer 1

您可以使用bigmemory或ff软件包。请参见下面的使用ff包读取/写入大型数据集的示例：

library(ff)

# Simulation
set.seed(123)
n <- 1000
m <- 100
fd <- as.ffdf(as.data.frame(matrix(sample(0:2, n * m, replace = TRUE), ncol = m, nrow = n)))

write.csv.ffdf(fd, file =  "test.csv")


# Read to ffdf 
fd_read <- read.csv.ffdf(file = "test.csv", header = TRUE)
matprint(fd_read)

输出：

V1 V2 V3 V4 V5 V6 V7 V8   V93 V94 V95 V96 V97 V98 V99 V100
1     0  0  0  0  0  0  1  2 :   1   0   2   1   2   0   0    2
2     2  1  0  2  2  2  1  1 :   2   0   1   0   2   1   1    1
3     1  0  0  1  1  2  1  0 :   0   1   1   0   1   2   1    2
4     2  2  1  1  2  1  1  2 :   1   2   0   0   0   0   1    1
5     2  2  1  0  0  0  0  0 :   0   0   1   1   1   2   2    2
6     0  1  1  1  0  0  2  2 :   2   1   0   2   2   0   2    1
7     1  2  1  2  2  0  0  1 :   0   2   2   2   0   2   0    2
8     2  0  0  1  0  1  2  0 :   1   2   0   0   0   0   2    2
:     :  :  :  :  :  :  :  : :   :   :   :   :   :   :   :    :
993   1  2  2  1  2  0  1  2 :   0   2   0   2   0   1   2    1
994   1  0  0  2  2  1  2  1 :   1   0   0   2   0   1   2    2
995   1  0  2  1  1  1  0  2 :   0   2   2   0   1   1   2    1
996   2  2  0  0  0  2  1  0 :   2   2   0   1   1   2   2    2
997   1  0  2  2  2  2  0  0 :   2   1   0   2   2   0   1    1
998   1  2  0  2  0  2  0  2 :   1   1   1   2   1   2   0    0
999   2  1  1  0  2  2  2  2 :   1   1   2   1   0   0   1    2
1000  0  1  2  1  0  2  2  1 :   1   2   0   1   2   0   2    0

按行合并多个矩阵，并将其存储在新矩阵中

1 个答案: