我有这样的数据框:
KEY C1 C2 C3 C4
A 0 0 1 0
B 0 0 1 0
C 0 1 1 0
D 0 0 1 0
E 1 0 1 0
F 1 0 0 0
G 0 1 0 0
H 0 0 1 0
I 0 1 1 0
J 1 0 0 1
并且想要构建这种矩阵,只有两个值“1”在两个变量中。
我不想计算有两个以上值的行:
KEY C1 C2 C3 C4
L 1 0 1 1
或少于两个:
M 1 0 0 0
输出应该是频率表。
C1 C2 C3 C4
C1 3 0 1 1
C2 0 3 2 0
C3 1 2 7 0
C4 1 0 0 1
C20可能会有更多变量,当然还有更多行。谢谢你帮助我!
答案 0 :(得分:3)
尝试
m1 <- t(df1[-1])
colnames(m1) <- df1[,1]
tcrossprod(m1)
# C1 C2 C3 C4
#C1 3 0 1 1
#C2 0 3 2 0
#C3 1 2 7 0
#C4 1 0 0 1
关于subset
部分,我没有得到预期的结果,
df1 <- df1[rowSums(df1[-1])==2,]
m1 <- t(df1[-1])
colnames(m1) <- df1[,1]
tcrossprod(m1)
# C1 C2 C3 C4
#C1 2 0 1 1
#C2 0 2 2 0
#C3 1 2 3 0
#C4 1 0 0 1
df1 <- structure(list(KEY = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), C1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L), C2 = c(0L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), C3 = c(1L, 1L, 1L, 1L, 1L,
0L, 0L, 1L, 1L, 0L), C4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L)), .Names = c("KEY", "C1", "C2", "C3", "C4"), class = "data.frame",
row.names = c(NA, -10L))
答案 1 :(得分:2)
看起来你想要先进行子集化。试试这个:
df <- read.csv("file1.csv")
df2 <- subset(df, rowSums(df[,-1]) == 2)
m1 <- t(df2[-1])
colnames(m1) <- df1[,1]
tcrossprod(m1)
这给出了
# C1 C2 C3 C4
# C1 2 0 1 1
# C2 0 2 2 0
# C3 1 2 3 0
# C4 1 0 0 1