我有以下数据框,并希望:
(1)建立一个4x4共生矩阵。
(2)使用循环来运行它,因为我使用的是一个包含更多变量的更大的数据集。
a <- rep(c("a", "a", "b", "c"), 4)
b <- rep(c("b", "c", "d", "d"), 4)
df <- data.frame(a,b)
out <- matrix(0,
nrow = 4, # how do I call just the levels?
ncol = 4)
out
以下代码不起作用,但可能有助于帮助我解决这个问题。
for (i in 1:nrow(df)) {
ind <- which(lvls == df[i, "a"])
out[i, ind] <- 1
}
out
# loop over variables in b
for (j in 1:nrow(df)) {
ind <- which(lvls == df[j, "b"])
out[j, ind] <- 1
}
out
这是我希望的输出......
[a] [b] [c] [d]
[a] 0 4 4 0
[b] 4 0 0 4
[c] 4 0 0 4
[d] 0 4 4 0
任何帮助都会很棒。提前谢谢!
答案 0 :(得分:3)
你可以尝试
lvls <- sort(as.character(unique(unlist(df))))
df[] <- lapply(df, function(x) factor(x, levels=lvls) )
m1 <- table(df)
m1[lower.tri(m1)] <- m1[upper.tri(m1)]
class(m1) <- "matrix"
dimnames(m1) <- unname(dimnames(m1)) #as suggested by @Richard Scriven
m1
# a b c d
# a 0 4 4 0
# b 4 0 0 4
# c 4 0 0 4
# d 0 4 4 0
假设您的数据已更改(由@ user20650提供)
df[1, ] <- c("b", "a")
df[] <- lapply(df, function(x) factor(x, levels=lvls) )
m1 <- table(df)
m2 <- m1 + t(m1)
m2 #you can convert to class `matrix` and change the dimnames as above
# b
#a b a c d
#b 0 4 0 4
#a 4 0 4 0
#c 0 4 0 4
#d 4 0 4 0
如果您不想要symmetric
矩阵,并希望拥有实际的counts
df[] <- lapply(df, function(x) factor(x, levels=lvls) )
m1 <- table(df)
indx <- !m1 & lower.tri(m1)
m1[indx] <- m1[t(indx)]
class(m1) <- "matrix"
dimnames(m1) <- unname(dimnames(m1))
m1
# b a c d
#b 0 1 0 4
#a 3 0 4 0
#c 0 4 0 4
#d 4 0 4 0
table(as.character(interaction(df,sep="")))
#ab ac ba bd cd
#3 4 1 4 4
关于多个变量,我不确定预期的结果,也许这会有所帮助:
indx <- combn(colnames(df1),2)
res <- Reduce(`+`,lapply(split(indx, col(indx)), function(x) table(df1[x])))
dimnames(res) <- unname(dimnames(res))
res
# a b c d e f g
#a 4 9 5 4 2 6 5
#b 8 6 13 6 5 9 3
#c 6 8 7 5 2 7 2
#d 4 3 5 6 2 2 6
#e 8 6 8 11 3 5 5
#f 4 4 3 5 2 1 4
#g 1 4 2 5 3 2 4
a <- rep(c("a", "a", "b", "c"), 4)
b <- rep(c("b", "c", "d", "d"), 4)
df <- data.frame(a,b, stringsAsFactors=FALSE)
多列数据
set.seed(24)
df1 <- as.data.frame(matrix(sample(letters[1:7], 6*16, replace=TRUE), ncol=6))
lvls1 <- sort(as.character(unique(unlist(df1))))
df1[] <- lapply(df1, function(x) factor(x, levels=lvls1))