从R中的固定宽度文件读取矩阵

时间:2013-02-18 09:42:12

标签: r file matrix

我是R世界的新手,我有一个包含这样的行系列的文件:

"0000010000010000000101000001000000011000000001
 0000000000000000000000010001000001001000110001
 0000100000000000000000010000000000000000010100
 0100000001100000000001001001100000010000000001
 0001000000000100010000010000000000010000000000"

我想从这个字符串开始构建一个矩阵。从现在开始我写了这段代码:

for(line in readLines(ff)){
     line <- as.numeric(substring(line, seq(1,nchar(line),1), seq(1,nchar(line),1)))
}

但它只从文件中提取行,如何使用line向量构建矩阵?

2 个答案:

答案 0 :(得分:6)

编辑:感谢Ananda Matho和agstudy的建议,这里有一个更好的方法来自动处理width参数。如果您的数据位于名为test.txt的文件中,则可以执行以下操作:

width <- nchar(readLines("test.txt", n=1))
m <- as.matrix(read.fwf("test.txt", widths=rep(1,width)))

我假设每个0/1都是一个不同的值。在这种情况下,您可以使用read.fwf,它允许通过指定每个字段的宽度来读取数据:

text <- "0000010000010000000101000001000000011000000001
0000000000000000000000010001000001001000110001
0000100000000000000000010000000000000000010100
0100000001100000000001001001100000010000000001
0001000000000100010000010000000000010000000000"

m <- as.matrix(read.fwf(textConnection(text), widths=rep(1,46)))

给出了:

R> m
     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
[1,]  0  0  0  0  0  1  0  0  0   0   0   1   0   0   0   0   0   0   0
[2,]  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
[3,]  0  0  0  0  1  0  0  0  0   0   0   0   0   0   0   0   0   0   0
[4,]  0  1  0  0  0  0  0  0  0   1   1   0   0   0   0   0   0   0   0
[5,]  0  0  0  1  0  0  0  0  0   0   0   0   0   1   0   0   0   1   0
     V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36
[1,]   1   0   1   0   0   0   0   0   1   0   0   0   0   0   0   0   1
[2,]   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1   0   0
[3,]   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
[4,]   0   0   1   0   0   1   0   0   1   1   0   0   0   0   0   0   1
[5,]   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1
     V37 V38 V39 V40 V41 V42 V43 V44 V45 V46
[1,]   1   0   0   0   0   0   0   0   0   1
[2,]   1   0   0   0   1   1   0   0   0   1
[3,]   0   0   0   0   0   1   0   1   0   0
[4,]   0   0   0   0   0   0   0   0   0   1
[5,]   0   0   0   0   0   0   0   0   0   0

在您的情况下,您将使用您的文件名替换textConnection(text)),并将rep(1,46)中的值46修改为矩阵每行中的值数。

答案 1 :(得分:6)

你也可以使用:

t <- readLines(textConnection("0000010000010000000101000001000000011000000001
0000000000000000000000010001000001001000110001
0000100000000000000000010000000000000000010100
0100000001100000000001001001100000010000000001
0001000000000100010000010000000000010000000000"))

do.call("rbind", lapply(strsplit(t, ""), as.numeric))

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    0    0    0    0    0    1    0    0    0     0     0     1     0     0
[2,]    0    0    0    0    0    0    0    0    0     0     0     0     0     0
[3,]    0    0    0    0    1    0    0    0    0     0     0     0     0     0
[4,]    0    1    0    0    0    0    0    0    0     1     1     0     0     0
[5,]    0    0    0    1    0    0    0    0    0     0     0     0     0     1
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,]     0     0     0     0     0     1     0     1     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0     1     0     0
[3,]     0     0     0     0     0     0     0     0     0     1     0     0
[4,]     0     0     0     0     0     0     0     1     0     0     1     0
[5,]     0     0     0     1     0     0     0     0     0     1     0     0
     [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38]
[1,]     0     1     0     0     0     0     0     0     0     1     1     0
[2,]     0     1     0     0     0     0     0     1     0     0     1     0
[3,]     0     0     0     0     0     0     0     0     0     0     0     0
[4,]     0     1     1     0     0     0     0     0     0     1     0     0
[5,]     0     0     0     0     0     0     0     0     0     1     0     0
     [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46]
[1,]     0     0     0     0     0     0     0     1
[2,]     0     0     1     1     0     0     0     1
[3,]     0     0     0     1     0     1     0     0
[4,]     0     0     0     0     0     0     0     1
[5,]     0     0     0     0     0     0     0     0