我想将NA填入下一行。这是数据集。
structure(list(timestamp = structure(c(1L,2L,3L,4L,5L,6L, 7L,8L,9L,10L,11L,1L,2L,3L,4L,5L,6L,7L,8L,9L,10L, 11L),.Label = c(“ 2019-07-07 00:00:00”,“ 2019-07-07 00:00:01”, “ 2019-07-07 00:00:02”,“ 2019-07-07 00:00:03”,“ 2019-07-07 00:00:04”, “ 2019-07-07 00:00:05”,“ 2019-07-07 00:00:06”,“ 2019-07-07 00:00:07”, “ 2019-07-07 00:00:08”,“ 2019-07-07 00:00:09”,“ 2019-07-07 00:00:10” ),类别=“因子”),来源=结构(c(NA,NA,NA,1L,NA, NA,1L,NA,NA,NA,NA,NA,2L,NA,2L,NA,NA,2L,NA,NA,2L, NA),.Label = c(“ USER_A”,“ USER_B”),class =“ factor”),value = c(NA, NA,NA,1L,NA,NA,1L,NA,NA,NA,NA,NA,1L,NA,1L,NA,NA, 2L,NA,NA,3L,NA)),类=“ data.frame”,row.names = c(NA, -22L))
timestamp source value
1 2019-07-07 00:00:00 <NA> NA
2 2019-07-07 00:00:01 <NA> NA
3 2019-07-07 00:00:02 <NA> NA
4 2019-07-07 00:00:03 USER_A 1
5 2019-07-07 00:00:04 <NA> NA
6 2019-07-07 00:00:05 <NA> NA
7 2019-07-07 00:00:06 USER_A 1
8 2019-07-07 00:00:07 <NA> NA
9 2019-07-07 00:00:08 <NA> NA
10 2019-07-07 00:00:09 <NA> NA
11 2019-07-07 00:00:10 <NA> NA
12 2019-07-07 00:00:00 <NA> NA
13 2019-07-07 00:00:01 USER_B 1
14 2019-07-07 00:00:02 <NA> NA
15 2019-07-07 00:00:03 USER_B 1
16 2019-07-07 00:00:04 <NA> NA
17 2019-07-07 00:00:05 <NA> NA
18 2019-07-07 00:00:06 USER_B 2
19 2019-07-07 00:00:07 <NA> NA
20 2019-07-07 00:00:08 <NA> NA
21 2019-07-07 00:00:09 USER_B 3
22 2019-07-07 00:00:10 <NA> NA
该表是时间和源之间的一种循环。每个来源(A和B)都有固定的行(在这种情况下为00:00:00到00:00:10)。
这是预期结果表。
timestamp source value
1 2019-07-07 00:00:00 <NA> NA
2 2019-07-07 00:00:01 <NA> NA
3 2019-07-07 00:00:02 <NA> NA
4 2019-07-07 00:00:03 USER_A 1
5 2019-07-07 00:00:04 USER_A 1
6 2019-07-07 00:00:05 USER_A 1
7 2019-07-07 00:00:06 USER_A 1
8 2019-07-07 00:00:07 <NA> NA
9 2019-07-07 00:00:08 <NA> NA
10 2019-07-07 00:00:09 <NA> NA
11 2019-07-07 00:00:10 <NA> NA
12 2019-07-07 00:00:00 <NA> NA
13 2019-07-07 00:00:01 USER_B 1
14 2019-07-07 00:00:02 USER_B 1
15 2019-07-07 00:00:03 USER_B 1
16 2019-07-07 00:00:04 USER_B 2
17 2019-07-07 00:00:05 USER_B 2
18 2019-07-07 00:00:06 USER_B 2
19 2019-07-07 00:00:07 USER_B 3
20 2019-07-07 00:00:08 USER_B 3
21 2019-07-07 00:00:09 USER_B 3
22 2019-07-07 00:00:10 <NA> NA
基于USER_A,将5和6行的值和源替换为7行的值和源。 USER_B行也将基于下一行以相同的方式替换。
如何在R中进行此过程?
答案 0 :(得分:1)
这是使用dplyr
的一种方法,因为每个source
的行数都是固定的。我们首先为每n
行创建一个组,并添加一个新列group2
,该列仅在该组中非NA值的min
和max
索引之间具有1。然后,我们group_by
group2
以及fill
根据分组按先前的非缺失值来缺失值。
n <- 11
library(dplyr)
df %>%
group_by(group1 = gl(n()/n, n)) %>%
mutate(group2 = 0,
group2 = replace(group2, min(which(!is.na(source))) :
max(which(!is.na(source))), 1)) %>%
group_by(group2) %>%
tidyr::fill(source, value) %>%
ungroup() %>%
select(-group1, -group2)
# A tibble: 22 x 3
# timestamp source value
# <fct> <fct> <int>
# 1 2019-07-07 00:00:00 NA NA
# 2 2019-07-07 00:00:01 NA NA
# 3 2019-07-07 00:00:02 NA NA
# 4 2019-07-07 00:00:03 USER_A 1
# 5 2019-07-07 00:00:04 USER_A 1
# 6 2019-07-07 00:00:05 USER_A 1
# 7 2019-07-07 00:00:06 USER_A 1
# 8 2019-07-07 00:00:07 NA NA
# 9 2019-07-07 00:00:08 NA NA
#10 2019-07-07 00:00:09 NA NA
# … with 12 more rows