假设我有一个数据框,其中三个列都为
> df
A B C
1232 27.3 0.42
1232 27.3 0.36
1232 13.1 0.15
7564 13.1 0.09
7564 13.1 0.63
所需的输出是:
[1232] [7564]
[13.1] 0.15 0.36
[27.3] 0.39 0
我需要在A和B中创建一个具有唯一值的矩阵作为行和列。通过将原始数据帧替换为A和B的特定值并计算C列的平均值,可以计算出矩阵中任何单元格的值。
我的代码是:
mat <- matrix(rep(0), length(unique(df$A)), nrow = length(sort(unique(df$B))))
# sort is to avoid NA
colnames(mat) <- unique(df$A)
rownames(mat) <- unique(df$B)
for (row in rownames(mat)) {
for (col in colnames(mat)) {
x <- subset(df, A == col & B == row)
mat[row, col] = mean(df$C)
}
}
考虑到我必须处理具有数千行和列的矩阵,这非常慢。如何使运行速度更快?
答案 0 :(得分:1)
您可以结合使用aggregate()
和xtabs()
:
df <- read.table(header=TRUE, stringsAsFactors = FALSE, text=
"A B C
1232 27.3 0.42
1232 27.3 0.36
1232 13.1 0.15
7564 13.1 0.09
7564 13.1 0.63")
xtabs(C ~ B + A, data=aggregate(C ~ B + A, data=df, FUN=mean))
# > xtabs(C ~ B + A, data=aggregate(C ~ B + A, data=df, FUN=mean))
# A
# B 1232 7564
# 13.1 0.15 0.36
# 27.3 0.39 0.00
有关其他解决方案,请阅读:How to reshape data from long to wide format?
答案 1 :(得分:1)
Tidyverse解决方案:
library(tidyverse)
df %>%
group_by(A, B) %>%
summarise(C = mean(C)) %>%
spread(A, C)
答案 2 :(得分:0)
您可能想要这样的东西:(使用data.table)
n <- 1e3
v <- LETTERS[1:5]
set.seed(42)
df <- data.frame(A = sample(v, n, replace = T),
B = sample(v, n, replace = T),
C = sample.int(1e2, n, replace = T))
require(data.table)
dt <- as.data.table(df)
r <- dt[, .(v = mean(C)), keyby = .(A, B)] # calculate mean for each combination
r <- dcast(r, B ~ A, value.var = 'v') # transform to your structure
rmat <- as.matrix(r[, -1]) # to matrix
rownames(rmat) <- r[[1]] # add row names
rmat[1:5, 1:5]
# A B C D E
# A 53.00000 42.71739 53.11538 49.35000 53.14286
# B 50.62745 58.41379 60.43590 48.75000 56.56410
# C 43.75000 42.93548 55.45000 52.63415 44.27907
# D 50.00000 49.84314 57.48276 50.37143 53.16667
# E 43.95122 55.46667 55.38095 43.85366 53.22222
P.S。您发布的代码不正确。循环应该是:
for (row in rownames(mat)) {
for (col in colnames(mat)) {
x <- subset(df, A == col & B == row)
mat[row, col] = mean(x$C)
}
}
P.S.S。可以像这样优化循环:
for (row in rownames(mat)) {
for (col in colnames(mat)) {
i <- (df$A == col & df$B == row)
mat[row, col] <- mean(df[i, 'C'])
}
}