Question

使用R，从省略上三角形部分的文件中读取对称矩阵的最佳方法是什么。例如，

1.000
.505  1.000
.569  .422  1.000
.602  .467  .926  1.000
.621  .482  .877  .874  1.000
.603  .450  .878  .894  .937  1.000

我尝试了read.table，但没有成功。

Answer 1

这是一个read.table and loopless和* apply-less解决方案：

txt <- "1.000
.505  1.000
.569  .422  1.000
.602  .467  .926  1.000
.621  .482  .877  .874  1.000
.603  .450  .878  .894  .937  1.000"
 # Could use clipboard or read this from a file as well.
mat <- data.matrix( read.table(text=txt, fill=TRUE, col.names=paste("V", 1:6))  )
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
> mat
        V1    V2    V3    V4    V5    V6 
[1,] 1.000 0.505 0.569 0.602 0.621 0.603
[2,] 0.505 1.000 0.422 0.467 0.482 0.450
[3,] 0.569 0.422 1.000 0.926 0.877 0.878
[4,] 0.602 0.467 0.926 1.000 0.874 0.894
[5,] 0.621 0.482 0.877 0.874 1.000 0.937
[6,] 0.603 0.450 0.878 0.894 0.937 1.000

Answer 2

我复制了您的文字，然后使用tt <- file('clipboard','rt')将其导入。对于标准文件：

tt <- file("yourfile.txt",'rt')
a <- readLines(tt)
b <- strsplit(a,"  ") #insert delimiter here; can use regex
b <- lapply(b,function(x) {
  x <- as.numeric(x)
  length(x) <- max(unlist(lapply(b,length))); 
  return(x)
})
b <- do.call(rbind,b)
b[is.na(b)] <- 0
#kinda kludgy way to get the symmetric matrix
b <- b + t(b) - diag(b[1,1],nrow=dim(b)[1],ncol=dim(b)[2]

Answer 3

我发帖但我更喜欢Blue Magister的方法。但也许这里有一些东西可供使用。

mat <- readLines(n=6)
1.000
.505  1.000
.569  .422  1.000
.602  .467  .926  1.000
.621  .482  .877  .874  1.000
.603  .450  .878  .894  .937  1.000

nmat <- lapply(mat, function(x) unlist(strsplit(x, "\\s+")))
lens <- sapply(nmat, length)
dlen <- max(lens) -lens
bmat <- lapply(seq_along(nmat), function(i) {
    as.numeric(c(nmat[[i]], rep(NA, dlen[i])))
})
mat <- do.call(rbind, bmat)
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
mat

Answer 4

如果矩阵的尺寸未知，这种方法也适用。

# read file as a vector
mat <- scan("file.txt", what = numeric())

# calculate the number of columns (and rows)
ncol <- (sqrt(8 * length(mat) + 1) - 1) / 2

# index of the diagonal values
diag_idx <- cumsum(seq.int(ncol))

# generate split index
split_idx <- cummax(sequence(seq.int(ncol)))
split_idx[diag_idx] <- split_idx[diag_idx] - 1

# split vector into list of rows
splitted_rows <- split(mat, f = split_idx)

# generate matrix
mat_full <- suppressWarnings(do.call(rbind, splitted_rows))
mat_full[upper.tri(mat_full)] <- t(mat_full)[upper.tri(mat_full)]


   [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
0 1.000 0.505 0.569 0.602 0.621 0.603
1 0.505 1.000 0.422 0.467 0.482 0.450
2 0.569 0.422 1.000 0.926 0.877 0.878
3 0.602 0.467 0.926 1.000 0.874 0.894
4 0.621 0.482 0.877 0.874 1.000 0.937
5 0.603 0.450 0.878 0.894 0.937 1.000

从文件中读取对称矩阵，省略上三角部分

4 个答案: