Question

假设我们有以下数据，其列名分别为“ id”，“ time”和“ x”：

df<-
structure(
list(
id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
time = c(20L, 6L, 7L, 11L, 13L, 2L, 6L),
x = c(1L, 1L, 0L, 1L, 1L, 1L, 0L)
),
.Names = c("id", "time", "x"),
class = "data.frame",
row.names = c(NA,-7L)
)

每个id对时间和x都有多个观察值。我想为每个id提取最后一个观察值，并形成一个新的数据框，该数据框根据原始数据中每个id的观察值来重复这些观察值。我可以使用以下代码提取每个ID的最后观察结果

library(dplyr) 
df<-df%>% 
group_by(id) %>% 
filter( ((x)==0 & row_number()==n())| ((x)==1 & row_number()==n()))

剩下的就是重复方面。预期的输出看起来像

df <-
structure(
list(
id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
time = c(7L, 7L, 7L, 13L, 13L, 6L, 6L),
x = c(0L, 0L, 0L, 1L, 1L, 0L, 0L)
),
.Names = c("id", "time", "x"),
class = "data.frame",
row.names = c(NA,-7L)
)

非常感谢您的帮助。

Answer 1

我们可以使用ave来找到每个max的{{1}}行号，并将其从数据帧中子集化。

ID

Answer 2

您可以使用last()来获取每个ID中的最后一行。

df %>%
    group_by(id) %>%
    mutate(time = last(time),
           x = last(x))

由于last(x)返回一个值，因此将其扩展以填充mutate()调用中的所有行。

这也可以使用mutate_at应用于任意数量的变量：

df %>%
    group_by(id) %>%
    mutate_at(vars(-id), ~ last(.))

Answer 3

使用data.table可以尝试

library(data.table)
setDT(df)[,.(time=rep(time[.N],.N), x=rep(x[.N],.N)), by=id]
   id time  x
1:  1    7  0
2:  1    7  0
3:  1    7  0
4:  2   13  1
5:  2   13  1
6:  3    6  0
7:  3    6  0

在@thelatemai之后，为避免命名列，您也可以尝试

df[, .SD[rep(.N,.N)], by=id]
   id time x
1:  1    7 0
2:  1    7 0
3:  1    7 0
4:  2   13 1
5:  2   13 1
6:  3    6 0
7:  3    6 0

使用R

3 个答案: