运行循环以构建共生矩阵

时间:2014-10-26 02:43:13

标签: r matrix

我有以下数据框,并希望:

(1)建立一个4x4共生矩阵。

(2)使用循环来运行它,因为我使用的是一个包含更多变量的更大的数据集。

a <- rep(c("a", "a", "b", "c"), 4)
b <- rep(c("b", "c", "d", "d"), 4)
df <- data.frame(a,b)

out <- matrix(0, 
              nrow = 4, # how do I call just the levels?
              ncol = 4)
out

以下代码不起作用,但可能有助于帮助我解决这个问题。

for (i in 1:nrow(df)) {
  ind <- which(lvls == df[i, "a"]) 
  out[i, ind] <- 1 
}
out

# loop over variables in b
for (j in 1:nrow(df)) {
  ind <- which(lvls == df[j, "b"]) 
  out[j, ind] <- 1 
}
out

这是我希望的输出......

       [a]  [b]  [c]  [d]
[a]     0    4    4    0
[b]     4    0    0    4
[c]     4    0    0    4
[d]     0    4    4    0

任何帮助都会很棒。提前谢谢!

1 个答案:

答案 0 :(得分:3)

你可以尝试

 lvls <- sort(as.character(unique(unlist(df))))
 df[] <- lapply(df, function(x) factor(x, levels=lvls) )
 m1 <- table(df)
 m1[lower.tri(m1)] <- m1[upper.tri(m1)]
 class(m1) <- "matrix"
 dimnames(m1) <- unname(dimnames(m1)) #as suggested by @Richard Scriven
 m1
 #    a b c d
 #  a 0 4 4 0
 #  b 4 0 0 4
 #  c 4 0 0 4
 #  d 0 4 4 0

更新

假设您的数据已更改(由@ user20650提供)

df[1, ] <- c("b", "a")
df[] <- lapply(df, function(x) factor(x, levels=lvls) )
m1 <- table(df)
m2 <- m1 + t(m1)
m2 #you can convert to class `matrix` and change the dimnames as above
#    b
#a b a c d
#b 0 4 0 4
#a 4 0 4 0
#c 0 4 0 4
#d 4 0 4 0

UPDATE2

如果您不想要symmetric矩阵,并希望拥有实际的counts

 df[] <- lapply(df, function(x) factor(x, levels=lvls) )
 m1 <- table(df)
 indx <- !m1 & lower.tri(m1)
 m1[indx] <- m1[t(indx)]
 class(m1) <- "matrix"
 dimnames(m1) <- unname(dimnames(m1))
 m1
 #  b a c d
 #b 0 1 0 4
 #a 3 0 4 0
 #c 0 4 0 4
 #d 4 0 4 0

 table(as.character(interaction(df,sep="")))

 #ab ac ba bd cd 
 #3  4  1  4  4 

UPDATE3

关于多个变量,我不确定预期的结果,也许这会有所帮助:

indx <- combn(colnames(df1),2)
res <- Reduce(`+`,lapply(split(indx, col(indx)), function(x) table(df1[x])))
dimnames(res) <- unname(dimnames(res))
res
#   a  b  c  d  e  f  g
#a  4  9  5  4  2  6  5
#b  8  6 13  6  5  9  3
#c  6  8  7  5  2  7  2
#d  4  3  5  6  2  2  6
#e  8  6  8 11  3  5  5
#f  4  4  3  5  2  1  4
#g  1  4  2  5  3  2  4

数据

a <- rep(c("a", "a", "b", "c"), 4)
b <- rep(c("b", "c", "d", "d"), 4)
df <- data.frame(a,b, stringsAsFactors=FALSE)

多列数据

 set.seed(24)
 df1 <- as.data.frame(matrix(sample(letters[1:7], 6*16, replace=TRUE), ncol=6))
 lvls1 <- sort(as.character(unique(unlist(df1))))
 df1[] <- lapply(df1, function(x) factor(x, levels=lvls1))