我正在尝试从data.table条目创建偏好或计数的方形矩阵(实际上并不重要)。
假设我有以下data.table
可以使用:
library(data.table)
segment=c("track","track","track","round","round","sprint","sprint","sprint","sprint")
athlete=c("gunnar","brandon","raphael","gunnar","ben","brandon","raphael","ben","gunnar")
time=c(54,56,57,23,25,15,16,16,17)
df <- data.table(athlete,segment,time)
df[,time_diff:=min(time)-time,by=segment]
df[,winner:=athlete[1],by=segment]
athlete segment time time_diff winner
1: gunnar track 54 0 gunnar
2: brandon track 56 -2 gunnar
3: raphael track 57 -3 gunnar
4: raphael round 23 0 raphael
5: ben round 25 -2 raphael
6: brandon round 28 -5 raphael
7: brandon sprint 15 0 brandon
8: raphael sprint 16 -1 brandon
9: ben sprint 19 -4 brandon
10: gunnar sprint 26 -11 brandon
names <- unique(df$athlete)
[1] "gunnar" "brandon" "raphael" "ben"
现在我希望在运动员身上有一个方形矩阵,显示他们的时间对抗每个赛道的胜利者,类似于:
gunnar brandon raphael ben
gunnar 0 -11 0 0
brandon -2 0 -5 0
raphael -3 -1 0 0
ben -2 -4 0 0
在我的脑海中,我有一些想法可以解决这个问题,但似乎没有任何结果。我来自MATLAB背景,在那里我只是迭代,但我觉得这根本不是data.table
方法。
我觉得我应该能够通过foreach
迭代对运动员来完成它。有点像:
foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]
[[1]]
winner pref
1: gunnar 0
2: brandon -11
[[2]]
winner pref
1: gunnar -2
2: raphael -5
3: brandon 0
[[3]]
winner pref
1: gunnar -3
2: raphael 0
3: brandon -1
[[4]]
winner pref
1: raphael -2
2: brandon -4
但是在这一点上我被困住了,并且不确定如何继续。我有一些初步的想法,创建了适用长度vec <- vector(mode="double", length=length(names))
的向量,然后使用which(names %in% df[,winner,by=IREALLYDONTKNOW])
对其进行索引,但正如您所看到的,我不清楚如何正确处理它。
如果有人愿意给我一些关于正确data.table
方法的提示,我将非常感激。
答案 0 :(得分:2)
虽然运行代码不会生成打印的表格,但我认为您要找的是dcast.data.table
:
dt_compare <- dcast.data.table(df, athlete ~ winner, value.var = "time_diff")
# add zero columns for athletes that did not win
dt_compare[, setdiff(unique(athlete), names(dt_compare)) := 0]
# you can also reorder columns
setcolorder(dt_compare, c("athlete", dt_compare[["athlete"]]))
答案 1 :(得分:0)
我解决它的方式实际上相当容易,经过一些实现:
names <- unique(df$athlete)
vec <- matrix(data = 0,nrow=length(names),ncol=length(names),dimnames=list(names,names))
pref <- foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]
foreach(n=1:length(names)) %do% (vec[names[n],pref[[n]]$winner] <- pref[[n]]$pref)
> vec
gunnar brandon raphael ben
gunnar 0 -11 0 0
brandon -2 0 -5 0
raphael -3 -1 0 0
ben 0 -4 -2 0