我试图从这个数据帧中获取频率表:
tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L),
a3 = c(0L, 1L, 0L), b1 = c(1L, 0L, 1L),
b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 1L)),
.Names = c("a1", "a2", "a3", "b1", "b2", "b3"),
class = "data.frame", row.names = c(NA, -3L))
tmp2 <- read.csv("tmp2.csv", sep=";")
tmp2
> tmp2
a1 a2 a3 b1 b2 b3
1 1 1 0 1 1 0
2 0 0 1 0 0 1
3 0 1 0 1 0 1
我尝试获得频率表如下:
table(tmp2[,1:3], tmp2[,4:6])
但我明白了:
sort.list(y)出错:'x'必须是'sort.list'原子的原子 你有没有在名单上打电话给'排序'?
预期产出:
信息:没有必要使用方形矩阵我应该能够添加b4 b5并保持a1 a2 a3
答案 0 :(得分:4)
选项:
a
如果您的b
和acols<-1:3 #state the indices of the a columns
bcols<-4:6 #same for b; if you add a column this should be 4:7
matrix(colSums(tmp2[,rep(acols,length(bcols))] & tmp2[,rep(bcols,each=length(acols))]),
ncol=length(bcols),nrow=length(acols),
dimnames=list(colnames(tmp2)[acols],colnames(tmp2)[bcols]))
列数不同,可以尝试:
controls
答案 1 :(得分:1)
这是一个可能的解决方案:
aIdxs <- 1:3
bIdxs <- 4:7
# init matrix
m <- matrix(0,
nrow = length(aIdxs), ncol=length(bIdxs),
dimnames = list(colnames(tmp2)[aIdxs],colnames(tmp2)[bIdxs]))
# create all combinations of a's and b's column indexes
idxs <- expand.grid(aIdxs,bIdxs)
# for each line and for each combination we add 1
# to the matrix if both a and b column are 1
for(r in 1:nrow(tmp2)){
m <- m + matrix(apply(idxs,1,function(x){ all(tmp2[r,x]==1) }),
nrow=length(aIdxs), byrow=FALSE)
}
> m
b1 b2 b3
a1 1 1 0
a2 2 1 1
a3 0 0 1
答案 2 :(得分:0)
另一种可能的解决方案。你的输入对于'table'来说有点棘手,因为你固有地有两个'a'和'b'在每一行中都有二进制指示符,表示只在'a'和'b'之间的成对实例,你想要循环它们。下面是一个广义的(但可能不那么优雅)函数,可以使用不同长度的'a'和'b':
tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L), a3 = c(0L,
1L, 0L), b1 = c(1L, 0L, 1L), b2 = c(1L, 0L, 0L), b3 = c(0L, 1L,
1L)), .Names = c("a1", "a2", "a3", "b1", "b2", "b3"), class = "data.frame", row.names = c(NA,
-3L))
fun = function(x) t(do.call("cbind", lapply(x[,grep("a", colnames(x))],
function(p) rowSums(do.call("rbind", lapply(x[,grep("b", colnames(x))],
function(q) q*p ))))))
fun(tmp2)
#> fun(tmp2)
# b1 b2 b3
#a1 1 1 0
#a2 2 1 1
#a3 0 0 1
# let's do a bigger example
set.seed(1)
m = matrix(rbinom(size=1, n=50, prob=0.75), ncol=10, dimnames=list(paste("instance_", 1:5, sep=""), c(paste("a",1:4,sep=""), paste("b",1:6,sep=""))))
# Notice that the count of possible a and b elements are not equal
#> m
# a1 a2 a3 a4 b1 b2 b3 b4 b5 b6
#instance_1 1 0 1 1 0 1 1 1 0 0
#instance_2 1 0 1 1 1 1 1 0 1 1
#instance_3 1 1 1 0 1 1 1 1 0 1
#instance_4 0 1 1 1 1 0 1 1 1 1
#instance_5 1 1 0 0 1 1 0 1 1 1
fun(as.data.frame(m))
#> fun(as.data.frame(m))
# b1 b2 b3 b4 b5 b6
#a1 3 4 3 3 2 3
#a2 3 2 2 3 2 3
#a3 3 3 4 3 2 3
#a4 2 2 3 2 2 2