我已经研究了一段时间,可以蛮力解决问题,但是我正在寻找一种更具可扩展性的方法。
基本问题是:如何仅替换某些NA,但避免更改由于数据序列尚未开始而应保留为NA的NA?这是一个示例:
可复制的示例
library(tidyverse)
# Create dummy data
dates <- seq.Date(as.Date("2019-01-01"), as.Date("2019-01-10"), by = 1)
item_1 <- c(rep(NA,1), 1:7, NA, 8)
item_2 <- c(rep(NA,4), 1:3, rep(NA,2), 9)
item_3 <- c(rep(NA,3), 8:11, rep(NA,2), 2)
item_4 <- c(rep(NA,2), 1:6, rep(NA,2))
df <- data.frame(dates, item_1, item_2, item_3, item_4)
>df
dates item_1 item_2 item_3 item_4
1 2019-01-01 NA NA NA NA
2 2019-01-02 1 NA NA NA
3 2019-01-03 2 NA NA 1
4 2019-01-04 3 NA 8 2
5 2019-01-05 4 1 9 3
6 2019-01-06 5 2 10 4
7 2019-01-07 6 3 11 5
8 2019-01-08 7 NA NA 6
9 2019-01-09 NA NA NA NA
10 2019-01-10 8 9 2 NA
# Replace NAs with zero --------------------
df_2 <- df %>%
replace(., is.na(.), 0)
> df_2
dates item_1 item_2 item_3 item_4
1 2019-01-01 0 0 0 0
2 2019-01-02 1 0 0 0
3 2019-01-03 2 0 0 1
4 2019-01-04 3 0 8 2
5 2019-01-05 4 1 9 3
6 2019-01-06 5 2 10 4
7 2019-01-07 6 3 11 5
8 2019-01-08 7 0 0 6
9 2019-01-09 0 0 0 0
10 2019-01-10 8 9 2 0
# Go back and replace the NAs that existed before the data of each row started
# Where the data first started (unique rows of first non-NA value)
list_of_1st_non_NAs <- unique(unlist( lapply( seq_len(ncol(df)), function(x) which( !is.na(df[,x]) )[1] ) ))
# Return data frame to show where values first start
df_3 <- df[list_of_1st_non_NAs, ] %>%
arrange(dates)
这就是我被困住的地方。我可以看到数据从哪里开始,因此可以用蛮力方式为各个列再次用NA替换先前的数据,但是我正在寻找一种更系统地执行此操作的方法。也许有些应用lapply?
谢谢!
所需的输出
dates item_1 item_2 item_3 item_4
1 2019-01-01 NA NA NA NA
2 2019-01-02 1 NA NA NA
3 2019-01-03 2 NA NA 1
4 2019-01-04 3 NA 8 2
5 2019-01-05 4 1 9 3
6 2019-01-06 5 2 10 4
7 2019-01-07 6 3 11 5
8 2019-01-08 7 0 0 6
9 2019-01-09 0 0 0 0
10 2019-01-10 8 9 2 0
答案 0 :(得分:3)
这里是dplyr
的一种方式-
df %>%
mutate_at(-1, ~replace(., is.na(.) & cumsum(!is.na(.)) > 0, 0))
dates item_1 item_2 item_3 item_4
1 2019-01-01 NA NA NA NA
2 2019-01-02 1 NA NA NA
3 2019-01-03 2 NA NA 1
4 2019-01-04 3 NA 8 2
5 2019-01-05 4 1 9 3
6 2019-01-06 5 2 10 4
7 2019-01-07 6 3 11 5
8 2019-01-08 7 0 0 6
9 2019-01-09 0 0 0 0
10 2019-01-10 8 9 2 0
由于@Frank:is.na(.) & cummax(!is.na(.))