我有以下R data.table
:
library(data.table)
dt =
unique_point biased data_points team groupID
1: up1 FALSE 3 1 xy28352
2: up1 TRUE 4 22 xy28352
3: up2 FALSE 1 4 xy28352
4: up2 TRUE 0 3 xy28352
5: up3 FALSE 12 5 xy28352
6: up3 TRUE 35 7 xy28352
....
我已经格式化了data.table,因此对于每个unique_point
,我正在测量unbiased
和biased
的数据点。因此每个unique_point
有两行,偏向FALSE且偏置为TRUE。如果没有测量值,则记录为0。
例如,对于up1
,无偏实验有3个数据点,有偏差实验有4个数据点。
每个groupID
有25个小组,每个小组可能会对biased
和unbiased
进行衡量。我想重新格式化data.table,以便按团队计算数据点的数量,对于每个唯一的数据点(由于数据,这将使行data_points
为0)。< / p>
unique_point biased data_points team groupID
1: up1 FALSE 3 1 xy28352
2: up1 TRUE 0 1 xy28352
3: up1 FALSE 0 2 xy28352
4: up1 TRUE 0 2 xy28352
5: up1 FALSE 0 3 xy28352
6: up1 TRUE 0 3 xy28352
....
45. up1 TRUE 4 22 xy28352
....
49. up1 FALSE 0 25 xy28352
50. up1 TRUE 0 25 xy28352
这项任务非常接近某些方式&#34;展开&#34; data.table。对于每个unique_point
,我将创建50行,25个队列为TRUE和FALSE。增加的复杂性是我需要使用上面的counts
来填充上面的计数。
应该有办法使用unique()
来计算行可能出现的次数吗?
如果我尝试
setkey(dt, team, unique_point)[CJ(unique(unique_point), unique(team)), .N, by=.EACHI]
我正在计算unique_point
和team
的行数。但这不会保留data_points
。
答案 0 :(得分:2)
使用:
DT2 <- DT[, .SD[CJ(team = 1:25, biased = biased, unique = TRUE), on = .(biased, team)], by = .(unique_point, groupID)
][is.na(data_points), data_points := 0][]
setcolorder(DT2, c(1,3:5,2))
给出:
> DT2 unique_point biased data_points team groupID 1: up1 FALSE 3 1 xy28352 2: up1 TRUE 0 1 xy28352 3: up1 FALSE 0 2 xy28352 4: up1 TRUE 0 2 xy28352 5: up1 FALSE 0 3 xy28352 --- 146: up3 TRUE 0 23 xy28352 147: up3 FALSE 0 24 xy28352 148: up3 TRUE 0 24 xy28352 149: up3 FALSE 0 25 xy28352 150: up3 TRUE 0 25 xy28352
这是做什么的:
DT
和unique_point
与groupID
by = .(unique_point, groupID)
CJ(team = 1:25, biased = biased)
和biased
的完整参考表(team
)相结合。NA
- DT
中不存在的行的值。因此,您使用[is.na(data_points), data_points := 0]
部分填充零。[]
)不是必需的,但在控制台上进行打印需要额外减少一步。有关详细信息,请see here。不需要使用setcolorder(DT2, c(1,3:5,2))
&amp;只有在您想要获得与问题中描述的完全相同的列顺序时才有必要。
作为替代方案,您还可以使用:
DT2 <- DT[CJ(unique_point = unique_point, biased = biased, team = 1:25, groupID = groupID, unique = TRUE),
on = .(unique_point, biased, team, groupID)
][is.na(data_points), data_points := 0][]
前60行:
> DT2[1:60] unique_point biased data_points team groupID 1: up1 FALSE 3 1 xy28352 2: up1 TRUE 0 1 xy28352 3: up1 FALSE 0 2 xy28352 4: up1 TRUE 0 2 xy28352 5: up1 FALSE 0 3 xy28352 6: up1 TRUE 0 3 xy28352 7: up1 FALSE 0 4 xy28352 8: up1 TRUE 0 4 xy28352 9: up1 FALSE 0 5 xy28352 10: up1 TRUE 0 5 xy28352 11: up1 FALSE 0 6 xy28352 12: up1 TRUE 0 6 xy28352 13: up1 FALSE 0 7 xy28352 14: up1 TRUE 0 7 xy28352 15: up1 FALSE 0 8 xy28352 16: up1 TRUE 0 8 xy28352 17: up1 FALSE 0 9 xy28352 18: up1 TRUE 0 9 xy28352 19: up1 FALSE 0 10 xy28352 20: up1 TRUE 0 10 xy28352 21: up1 FALSE 0 11 xy28352 22: up1 TRUE 0 11 xy28352 23: up1 FALSE 0 12 xy28352 24: up1 TRUE 0 12 xy28352 25: up1 FALSE 0 13 xy28352 26: up1 TRUE 0 13 xy28352 27: up1 FALSE 0 14 xy28352 28: up1 TRUE 0 14 xy28352 29: up1 FALSE 0 15 xy28352 30: up1 TRUE 0 15 xy28352 31: up1 FALSE 0 16 xy28352 32: up1 TRUE 0 16 xy28352 33: up1 FALSE 0 17 xy28352 34: up1 TRUE 0 17 xy28352 35: up1 FALSE 0 18 xy28352 36: up1 TRUE 0 18 xy28352 37: up1 FALSE 0 19 xy28352 38: up1 TRUE 0 19 xy28352 39: up1 FALSE 0 20 xy28352 40: up1 TRUE 0 20 xy28352 41: up1 FALSE 0 21 xy28352 42: up1 TRUE 0 21 xy28352 43: up1 FALSE 0 22 xy28352 44: up1 TRUE 4 22 xy28352 45: up1 FALSE 0 23 xy28352 46: up1 TRUE 0 23 xy28352 47: up1 FALSE 0 24 xy28352 48: up1 TRUE 0 24 xy28352 49: up1 FALSE 0 25 xy28352 50: up1 TRUE 0 25 xy28352 51: up2 FALSE 0 1 xy28352 52: up2 TRUE 0 1 xy28352 53: up2 FALSE 0 2 xy28352 54: up2 TRUE 0 2 xy28352 55: up2 FALSE 0 3 xy28352 56: up2 TRUE 0 3 xy28352 57: up2 FALSE 1 4 xy28352 58: up2 TRUE 0 4 xy28352 59: up2 FALSE 0 5 xy28352 60: up2 TRUE 0 5 xy28352
使用过的数据:
DT <- fread('unique_point biased data_points team groupID
up1 FALSE 3 1 xy28352
up1 TRUE 4 22 xy28352
up2 FALSE 1 4 xy28352
up2 TRUE 0 3 xy28352
up3 FALSE 12 5 xy28352
up3 TRUE 35 7 xy28352')