从data.table

时间:2017-03-12 18:19:38

标签: r foreach data.table r-package

我正在尝试从data.table条目创建偏好或计数的方形矩阵(实际上并不重要)。

假设我有以下data.table可以使用:

library(data.table)

segment=c("track","track","track","round","round","sprint","sprint","sprint","sprint")
athlete=c("gunnar","brandon","raphael","gunnar","ben","brandon","raphael","ben","gunnar")
time=c(54,56,57,23,25,15,16,16,17)

df <- data.table(athlete,segment,time)

df[,time_diff:=min(time)-time,by=segment]

df[,winner:=athlete[1],by=segment]

    athlete segment time time_diff  winner
 1:  gunnar   track   54         0  gunnar
 2: brandon   track   56        -2  gunnar
 3: raphael   track   57        -3  gunnar
 4: raphael   round   23         0 raphael
 5:     ben   round   25        -2 raphael
 6: brandon   round   28        -5 raphael
 7: brandon  sprint   15         0 brandon
 8: raphael  sprint   16        -1 brandon
 9:     ben  sprint   19        -4 brandon
10:  gunnar  sprint   26       -11 brandon

names <- unique(df$athlete)

[1] "gunnar"  "brandon" "raphael" "ben" 

现在我希望在运动员身上有一个方形矩阵,显示他们的时间对抗每个赛道的胜利者,类似于:

        gunnar  brandon  raphael  ben
gunnar     0     -11        0      0       
brandon   -2       0       -5      0
raphael   -3      -1        0      0
ben       -2      -4        0      0

在我的脑海中,我有一些想法可以解决这个问题,但似乎没有任何结果。我来自MATLAB背景,在那里我只是迭代,但我觉得这根本不是data.table方法。

我觉得我应该能够通过foreach迭代对运动员来完成它。有点像:

foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]

[[1]]
    winner pref
1:  gunnar    0
2: brandon  -11

[[2]]
    winner pref
1:  gunnar   -2
2: raphael   -5
3: brandon    0

[[3]]
    winner pref
1:  gunnar   -3
2: raphael    0
3: brandon   -1

[[4]]
    winner pref
1: raphael   -2
2: brandon   -4

但是在这一点上我被困住了,并且不确定如何继续。我有一些初步的想法,创建了适用长度vec <- vector(mode="double", length=length(names))的向量,然后使用which(names %in% df[,winner,by=IREALLYDONTKNOW])对其进行索引,但正如您所看到的,我不清楚如何正确处理它。

如果有人愿意给我一些关于正确data.table方法的提示,我将非常感激。

2 个答案:

答案 0 :(得分:2)

虽然运行代码不会生成打印的表格,但我认为您要找的是dcast.data.table

dt_compare <- dcast.data.table(df, athlete ~ winner, value.var = "time_diff")
# add zero columns for athletes that did not win
dt_compare[, setdiff(unique(athlete), names(dt_compare)) := 0]
# you can also reorder columns
setcolorder(dt_compare, c("athlete", dt_compare[["athlete"]]))

答案 1 :(得分:0)

我解决它的方式实际上相当容易,经过一些实现:

names <- unique(df$athlete)

vec <- matrix(data = 0,nrow=length(names),ncol=length(names),dimnames=list(names,names))

pref <- foreach(n=1:length(names)) %do% df[athlete==names[n],.(time_diff, winner),by=segment][,.(pref=sum(time_diff)),by=winner]

foreach(n=1:length(names)) %do% (vec[names[n],pref[[n]]$winner] <- pref[[n]]$pref)

> vec
        gunnar brandon raphael ben
gunnar       0     -11       0   0
brandon     -2       0      -5   0
raphael     -3      -1       0   0
ben          0      -4      -2   0