我有如下数据:
data <- structure(list(seq = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L), new_seq = c(2, 2,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
2, 2, 2, 2, NA, NA, NA, NA, NA, 4, 4, 4, 4, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 6, 6, 6, 6, 6, NA, NA, 8, 8, 8, NA, NA, NA), value = c(2L,
0L, 0L, 1L, 0L, 5L, 5L, 3L, 0L, 3L, 2L, 3L, 2L, 3L, 4L, 1L, 0L,
0L, 0L, 1L, 1L, 0L, 2L, 5L, 3L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 3L,
5L, 3L, 1L, 1L, 1L, 0L, 1L, 0L, 4L, 3L, 0L, 3L, 1L, 3L, 0L, 0L,
1L, 0L, 0L, 3L, 4L, 5L, 3L, 5L, 3L, 5L, 0L, 1L, 1L, 3L, 2L, 1L,
0L, 0L, 0L, 0L, 5L, 1L, 1L, 0L, 4L, 1L, 5L, 0L, 3L, 1L, 2L, 1L,
0L, 3L, 0L, 1L, 1L, 3L, 0L, 1L, 1L, 2L, 2L, 1L, 0L, 4L, 0L, 0L,
3L, 0L, 0L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl",
"data.frame"))
列new_seq
引用seq
的值。对于new_seq
中不是NA
的每个值,我想根据各自的2
计算value
的最后seq
行的平均值。因此,例如,新列的行1:2
的值应为0.5
(行49:50
的平均值),行51:54
的值也应为{{1 }}(行0.5
的平均值),但行49:50
的值应为60:63
(行4
的平均值)。我该如何使用58:59
来做到这一点?
答案 0 :(得分:2)
像这样吗?
# calculate the mean value based on the last two rows of each seq
lookup <- data %>%
group_by(seq) %>%
mutate(rank = seq(n(), 1)) %>%
filter(rank <= 2) %>%
summarise(new_column = mean(value)) %>%
ungroup()
# match back to original dataset (only non-NA values of new_seq can be matched)
left_join(data, lookup, by = c("new_seq" = "seq"))
结果:
# A tibble: 100 x 4
seq new_seq value new.column
<int> <dbl> <int> <dbl>
1 1 2 2 0.5
2 1 2 0 0.5
3 2 NA 0 NA
4 2 NA 1 NA
...
答案 1 :(得分:0)
嗯,只有[]
的一半,我敢肯定有人会做得更好,但这是一种尝试。
tidyverse
和group_by
使计算组中最后两行的平均值变得容易,但是我不知道如何获得mutate
和{ {1}}所以我是在R底下完成的。
seq
这是结果。我对相关的行进行了子集化(因为否则它太长了,无法一次在屏幕上看到),但是将原始行号添加为new_seq
列:
dat2 <- dat %>%
group_by(seq) %>%
mutate(end_val = (nth(value, -1L) + nth(value, -2L))/2)
dat3$result <- apply(dat2, 1, function(x) {
dat2[dat2$seq == x['new_seq'], 'end_val'][[1]][1]
})