我的数据集很大,需要比较列之间的所有组合。所需的输出将是每个列组合的矩阵。
开始的数据框看起来像Data
:
set.seed(1)
Data <- data.frame(
ID = (1:100),
A = sample(1:10,10),
B = sample(1:20,100,replace = T),
C = sample(1:5,100,replace = T),
D = sample(1:20,100,replace = T)
)
Data
,我想知道两列中相同组合出现的频率。
(例如,同一行1
中的A
和4
中的B
多久出现一次)
对于A到D列之间的所有组合?
我正在使用:
require(dplyr)
X1 <- ddply(Data,.(A,B),transmute, count=length(ID))
并得到一个像这样的对象:
A B count
1 1 3 1
2 1 7 1
3 1 9 2
4 1 9 2
5 1 12 1
6 1 13 1
7 1 14 1
8 1 16 1
9 1 18 1
10 1 20 1
11 2 2 1
12 2 6 1
13 2 10 1
14 2 11 1
但是如何获得矩阵格式的count
结果呢?
A对B冷的输出如下:
B1 B2 B3 B4 B5 B6
A1 1 1 2 1 1 ...
A2 1 1 2 1 1
A3 2 1 1 1 1
A4 2 1 1 1 1
A5 1 1 2 1 2
A6 1 1 2 1 2
A7 1 3 1 1 1
A8 1 3 1 1 2
A9 1 3 2 1 2
A10 1 1 2 1 1
In the best case the result would be a `list` containing the objects `AB` `AC` ...`CD` as matrix.
答案 0 :(得分:1)
您可以这样做:
library(tidyverse)
X2 <-X1 %>% group_by(A,B) %>%
summarise(count=max(count)) %>% #use max instead of sum
ungroup() %>%
mutate(A=paste0("A",A),B=paste0("B",B)) %>%
spread(B,count,fill=0)
X3 <- as.matrix(X2[,2:ncol(X2)])
rownames(X3) <- as.character(X2$A)
答案 1 :(得分:0)
使用aggregate
和reshape
来考虑基数R:
agg <- aggregate(cbind(count=ID) ~ B + A, Data, FUN=length)
rdf <- reshape(agg, timevar = "B", idvar = "A",
drop = c("ID", "C", "D"),
direction = "wide")
# CLEAN-UP
rdf <- with(rdf, rdf[order(A), c("A", paste0("count.", 1:20))]) # RE-ORDER ROWS AND COLS
rownames(rdf) <- NULL # RESET ROW NAMES
colnames(rdf) <- gsub("count.", "B", names(rdf)) # RENAME COL NAMES
rdf[is.na(rdf)] <- 0 # CONVERT NAs TO O
rdf
# A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B20
# 1 1 1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 2 1
# 2 2 0 0 0 0 0 0 2 0 1 3 0 0 2 0 1 0 0 0 1 0
# 3 3 1 1 0 0 0 1 0 0 1 0 0 0 0 2 1 1 1 0 0 1
# 4 4 1 0 0 0 1 2 0 0 0 2 1 1 0 1 0 1 0 0 0 0
# 5 5 0 0 0 0 1 2 1 3 0 0 0 1 0 0 0 0 0 0 1 1
# 6 6 1 0 2 0 1 2 1 0 0 0 0 0 1 0 0 0 0 0 0 2
# 7 7 1 0 0 0 1 1 1 0 0 2 1 0 0 0 0 2 1 0 0 0
# 8 8 0 0 2 0 0 0 1 2 1 0 2 0 0 0 0 0 0 1 1 0
# 9 9 1 0 0 0 0 1 1 0 0 0 0 1 1 2 1 0 0 1 1 0
# 10 10 1 1 0 0 0 0 0 0 1 1 0 2 0 1 1 0 1 0 1 0