Question

我有一个数据集，其中包含对长格式的个人的重复观察。因此，每一行都是A或B类型的观察。以下代码将再现数据集。

library(data.table)
set.seed(1487)
dat <- data.table(id = rep(seq(10), 2), 
                  type = c(rep("A", 10), rep("B", 10)), 
                  x = sample.int(100,20))
dat
#     id type  x
#  1:  1    A 38
#  2:  2    A 58
#  3:  3    A 28
#  4:  4    A 21
#  5:  5    A 19
#  6:  6    A 62
#  7:  7    A 52
#  8:  8    A 86
#  9:  9    A 85
# 10: 10    A 90
# 11:  1    B 15
# 12:  2    B 11
# 13:  3    B 37
# 14:  4    B 93
# 15:  5    B 34
# 16:  6    B 91
# 17:  7    B 79
# 18:  8    B 94
# 19:  9    B 24
# 20: 10    B 41

然后我选择按x排名的前三个人对两种类型的观察结果：

setorderv(dat, c("type", "x"), c(1, -1))
top3 <- dat[, head(.SD, 3), by = list(type)]
top3
#    type id  x
# 1:    A 10 90
# 2:    A  8 86
# 3:    A  9 85
# 4:    B  8 94
# 5:    B  4 93
# 6:    B  6 91

现在我想添加一个包含相反观察类型的原始x值的列。如果那有意义的话。因此，以下代码重现了我正在寻找的内容：

top3[,x2 := c(41, 94, 24, 86, 21, 62)]
#    type id  x x2
# 1:    A 10 90 41
# 2:    A  8 86 94
# 3:    A  9 85 24
# 4:    B  8 94 86
# 5:    B  4 93 21
# 6:    B  6 91 62

当然，我可以逐行浏览整个数据集并使用if语句或其他任何内容。原始数据集非常大，我正在寻找一种优雅而有效的方法。我非常喜欢data.table，我最近一直在使用它。我知道有一种简单优雅的方法。我也尝试使用.GRP。我需要一些帮助。

提前致谢！

我的最终解决方案

感谢那些提供灵感的人。那些感兴趣的人是我对我的问题的解决方案，它实际上对项目的意图更有效。

dat <- dcast.data.table(dat, id~type, value.var = "x")
top3 <- rbind(dat[order(-A), head(.SD, 3L)][,rank_by := "A"],
              dat[order(-B), head(.SD, 3L)][,rank_by := "B"])
#    id  A  B rank_by
# 1: 10 90 41       A
# 2:  8 86 94       A
# 3:  9 85 24       A
# 4:  8 86 94       B
# 5:  4 21 93       B
# 6:  6 62 91       B

干杯，

tstev

Answer 1

好像你想要按id和相反类型合并。根据您的具体情况，我可能只是跳过更改类型，并在两种类型上合并，并丢弃相同的类型（以下代码假定版本为1.9.5+）：

(dat[order(-x), head(.SD, 3), by = type]
    [dat, on = 'id', nomatch = 0][type != i.type]
    [order(type, -id)])
#   type id  x i.type i.x
#1:    A 10 90      B  41
#2:    A  8 86      B  94
#3:    A  9 85      B  24
#4:    B  8 94      A  86
#5:    B  4 93      A  21
#6:    B  6 91      A  62

Answer 2

可能不是最优雅的方式，但它有效：

setkeyv(dat, c("type", "id"))

my.order <- dat[order(-rank(type)), .(id, type)]
dat[, x2 := dat[.(my.order$type, my.order$id), x]]

setorderv(dat, c("type", "x"), c(1, -1))
top3 <- dat[, head(.SD, 3), by = .(type)]
top3

# type id  x x2
# 1:    A 10 90 41
# 2:    A  8 86 94
# 3:    A  9 85 24
# 4:    B  8 94 86
# 5:    B  4 93 21
# 6:    B  6 91 62

修改看看@ eddi的答案和关于可读性的讨论，我记得有关dplyr包的内容。所以按照他的步骤：

library(dplyr) dat %>% arrange(desc(x)) %>% group_by(type) %>% summarise_each(funs(head(., 3))) %>% left_join(., dat, by = "id") %>% filter(type.x != type.y) %>% arrange(type.x, desc(id)) # id type.x x.x type.y x.y # 1 10 A 90 B 41 # 2 9 A 85 B 24 # 3 8 A 86 B 94 # 4 8 B 94 A 86 # 5 6 B 91 A 62 # 6 4 B 93 A 21

Answer 3

怎么样

subset(merge(top3, dat, by = "id"), type.x != type.y)[, type.y:=NULL][]   
#   id type.x x.x x.y
#1:  4      B  93  21
#2:  6      B  91  62
#3:  8      A  86  94
#4:  8      B  94  86
#5:  9      A  85  24
#6: 10      A  90  41

（要保留与帖子中的名称相同的名称，您需要将其包装在setnames(..., c("id", "type", "x", "x2"))）

Answer 4

可能不是最优雅的方式。但是，我建议使用以下代码：

[[0,2],[1,3]]

亲切的问候

使用data.table在R中查找

4 个答案: