我有一个长格式数据集,类似于
1. ReactionTime X(a categorical variable) y(a categorical variable)
2. 1.23 * 1 * 4
3. 2.33 * 2 * 4
4. 3.45 * 3 * 5
5. 1.44 * 4 * 2
6. 1.27 * 5 * 6
7. 5.44 * 5 * 5
8. 3.22 * 7 * 4
9. 3.22 * 8 * 2
10. 3.56 * 1 * 4
我希望将上面的数据集转换为水平线上变量x和垂直线上变量y的矩阵。但是你可以看到第一个和最后一个观察共享相同的“细胞”,它们都有1中的x变量和4中的y变量。我的意图是计算第一次和最后一次观察的反应时间的平均值并将平均值计算到细胞,所以我能做到吗?谢谢!
答案 0 :(得分:1)
如果我正确理解您的问题,以下内容应该有效:
library(dplyr); library(tidyr); library(tibble)
df %>%
# calculate mean reaction time for each cell
group_by(X, Y) %>%
summarise(ReactionTime = mean(ReactionTime)) %>%
ungroup() %>%
# spread cells (if you don't want NAs in empty cells, use the 2nd version)
spread(Y, ReactionTime) %>%
# spread(Y, ReactionTime, fill = 0) %>%
# convert to matrix with X in row names & Y in column names
remove_rownames() %>%
column_to_rownames("X") %>%
as.matrix()
2 4 5 6
1 NA 2.395 NA NA
2 NA 2.330 NA NA
3 NA NA 3.45 NA
4 1.44 NA NA NA
5 NA NA 5.44 1.27
7 NA 3.220 NA NA
8 3.22 NA NA NA
数据:
df <- read.table(header = T, text = "ReactionTime X Y
1.23 1 4
2.33 2 4
3.45 3 5
1.44 4 2
1.27 5 6
5.44 5 5
3.22 7 4
3.22 8 2
3.56 1 4")
答案 1 :(得分:1)
您正在尝试将数据从长格式重塑为宽格式。
来自dcast
的{{1}}专为此类操作而设计:
data.table
答案 2 :(得分:0)
使用Base R
df = read.table(header = T, text = "ReactionTime X Y
1.23 1 4
2.33 2 4
3.45 3 5
1.44 4 2
1.27 5 6
5.44 5 5
3.22 7 4
3.22 8 2
3.56 1 4")
c=sort(unique(df$X))
r=sort(unique(df$Y))
W=unique(df[2:3])
M=matrix(NA,ncol=length(c), nrow=length(r))
dimnames(M)=list( r , c )
for( i in 1:dim(W)[1] ){
c0=which(c==W[i,1])
r0=which(r==W[i,2])
A=which(df$X==W[i,1] & df$Y==W[i,2])
M[r0, c0] = mean(df$ReactionTime[A],na.rm = TRUE)
}
答案 3 :(得分:0)
使用SQL,使用包sqldf
df = read.table(header = T, text = "ReactionTime X Y
1.23 1 4
2.33 2 4
3.45 3 5
1.44 4 2
1.27 5 6
5.44 5 5
3.22 7 4
3.22 8 2
3.56 1 4")
#.............................................................
library(sqldf)
c=data.frame(c=sort(unique(df$X)))
r=data.frame(r=sort(unique(df$Y)))
DAT=sqldf("select r.r , c.c ,
avg(ReactionTime) as M
from r cross join c
left join df
on r.r=df.Y and c.c=df.X
group by r.r, c.c
order by r.r, c.c")
M = matrix(DAT$M, nrow=dim(r)[1], ncol=dim(c)[1], byrow=TRUE)
dimnames(M)=list(r$r, c$c)
# M
# 1 2 3 4 5 7 8
# 2 NA NA NA 1.44 NA NA 3.22
# 4 2.395 2.33 NA NA NA 3.22 NA
# 5 NA NA 3.45 NA 5.44 NA NA
# 6 NA NA NA NA 1.27 NA NA
答案 4 :(得分:0)
试试tapply
。没有包使用。
tapply(df$ReactionTime, df[c("Y", "X")], mean)
,并提供:
X
Y 1 2 3 4 5 7 8
2 NA NA NA 1.44 NA NA 3.22
4 2.395 2.33 NA NA NA 3.22 NA
5 NA NA 3.45 NA 5.44 NA NA
6 NA NA NA NA 1.27 NA NA
注意: df
的可重现形式为:
df <- structure(list(ReactionTime = c(1.23, 2.33, 3.45, 1.44, 1.27,
5.44, 3.22, 3.22, 3.56), X = c(1L, 2L, 3L, 4L, 5L, 5L, 7L, 8L,
1L), Y = c(4L, 4L, 5L, 2L, 6L, 5L, 4L, 2L, 4L)), .Names = c("ReactionTime",
"X", "Y"), class = "data.frame", row.names = c(NA, -9L))