我有一个数据框test
> test
foo bar baz timestamp
1 1 <NA> a 1552157998
2 1 <NA> <NA> 1552161596
3 1 stop <NA> 1552165194
4 1 <NA> b 1552168795
5 1 <NA> a 1552170839
6 1 <NA> <NA> 1552157998
7 1 stop <NA> 1552161596
8 1 <NA> a 1552165194
9 1 <NA> b 1552168795
10 1 <NA> <NA> 1552170839
我的目标是针对stop
的每个实例,找到每个方向上最接近的非NA值(基于timestamp
),该值将生成如下表:
> output
rownum pre post
1 3 a b
2 7 a a
是否存在使用zoo
和na.locf()
进行此操作的已知方法?
任何建议将不胜感激
dput(test)
structure(list(foo = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), bar = c(NA,
NA, "stop", NA, NA, NA, "stop", NA, NA, NA), baz = c("a", NA,
NA, "b", "a", NA, NA, "a", "b", NA), timestamp = c(1552157998.427,
1552161596.004, 1552165194.255, 1552168794.918, 1552170839.363,
1552157998.427, 1552161596.004, 1552165194.255, 1552168794.918,
1552170839.363)), row.names = c(NA, -10L), class = "data.frame")
答案 0 :(得分:3)
我将仅使用magrittr
来组织代码。只需花费很少的精力,就可以轻松地将其翻译为非magrittr
,dplyr
或data.table
。
library(magrittr)
test %>%
.[ order(.$timestamp), ] %>%
transform(.,
rownum = seq_len(nrow(.)),
pre = zoo::na.locf0(baz),
post = zoo::na.locf0(baz, fromLast = TRUE)) %>%
subset(., bar == "stop") %>%
.[, c("rownum", "pre", "post")]
# rownum pre post
# 7 4 a a
# 3 5 a a
(这与预期的输出不同,可能是因为这是一个错误吗?)
通过在subset
之前查看它,您可以更好地了解它的作用:
test %>%
.[ order(.$timestamp), ] %>%
transform(.,
rownum = seq_len(nrow(.)),
pre = zoo::na.locf0(baz),
post = zoo::na.locf0(baz, fromLast = TRUE))
# foo bar baz timestamp rownum pre post
# 1 1 <NA> a 1552157998 1 a a
# 6 1 <NA> <NA> 1552157998 2 a a
# 2 1 <NA> <NA> 1552161596 3 a a
# 7 1 stop <NA> 1552161596 4 a a
# 3 1 stop <NA> 1552165194 5 a a
# 8 1 <NA> a 1552165194 6 a a
# 4 1 <NA> b 1552168795 7 b b
# 9 1 <NA> b 1552168795 8 b b
# 5 1 <NA> a 1552170839 9 a a
# 10 1 <NA> <NA> 1552170839 10 a <NA>