我有一个如下所示的数据框:
df_raw <- structure(list(date = structure(c(17075, 17076, 17077, 17108,
17109, 17110, 17111, 17112, 17113, 17221, 17222, 17223, 17224,
17225, 17226, 17227, 17228, 17229, 17230, 17231, 17232, 17286,
17075, 17076, 17077, 17078, 17079, 17080, 17081, 17082, 17083,
17084, 17085, 17086, 17087, 17088, 17089, 17090, 17091), class = "Date"),
Req_BU = c("12018", "12018", "12018", "12018", "12018", "12018",
"12018", "12018", "12018", "12018", "12018", "12018", "12018",
"12018", "12018", "12018", "12018", "12018", "12018", "12018",
"12018", "12018", "14004", "14004", "14004", "14004", "14004",
"14004", "14004", "14004", "14004", "14004", "14004", "14004",
"14004", "14004", "14004", "14004", "14004"), last_rec_date = c(1L,
1L, 1L, 1L, 1L, NA, NA, 3L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 1L,
1L, 1L, NA, NA, 3L, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA,
3L, 1L, 1L, 1L, 1L, NA, 2L, 1L)), .Names = c("date", "Req_BU",
"last_rec_date"), row.names = c(NA, -39L), class = "data.frame")
> head(df_raw, 10)
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 1
6 2016-11-05 12018 NA
7 2016-11-06 12018 NA
8 2016-11-07 12018 3
9 2016-11-08 12018 1
10 2017-02-24 12018 1
> df_raw[22:30, ]
date Req_BU last_rec_date
22 2017-04-30 12018 1
23 2016-10-01 14004 NA
24 2016-10-02 14004 NA
25 2016-10-03 14004 1
26 2016-10-04 14004 1
27 2016-10-05 14004 1
28 2016-10-06 14004 1
29 2016-10-07 14004 1
30 2016-10-08 14004 NA
我需要做的是将NA
列中的last_rec_date
值替换为自上次非NA
以来的天数。这一切都需要基于名为Req_BU
的分组变量来完成。我的数据从2016年10月1日开始,如果特定的Req_BU
以NA
开头,那么我需要用1
来填充,并继续这样做,直到有一个NA
非> head(df_hope, 10)
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 1
6 2016-11-05 12018 1
7 2016-11-06 12018 2
8 2016-11-07 12018 3
9 2016-11-08 12018 1
10 2017-02-24 12018 1
> df_hope[22:30, ]
date Req_BU last_rec_date
22 2017-04-30 12018 1
23 2016-10-01 14004 1
24 2016-10-02 14004 1
25 2016-10-03 14004 1
26 2016-10-04 14004 1
27 2016-10-05 14004 1
28 2016-10-06 14004 1
29 2016-10-07 14004 1
30 2016-10-08 14004 1
值,此时正常逻辑接管。
我正在寻找这样的东西。
library(dplyr)
df_not_working <- df_raw %>%
group_by(Req_BU) %>%
mutate(last_rec_date = ifelse(is.na(last_rec_date),
c(NA, diff(date)),
last_rec_date))
> df_not_working
Source: local data frame [39 x 3]
Groups: Req_BU [2]
# A tibble: 39 x 3
date Req_BU last_rec_date
<date> <chr> <dbl>
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 1
6 2016-11-05 12018 1
7 2016-11-06 12018 1
8 2016-11-07 12018 3
9 2016-11-08 12018 1
10 2017-02-24 12018 1
我尝试了这个,但它甚至没有处理我需要的逻辑的第一部分。
dplyr
分析的其余部分非常var app = express();
app.listen(process.env.port);
,所以我可以使用它或基本解决方案(如果存在)。谢谢。
答案 0 :(得分:1)
也许这会奏效吗?不是R-ish所以也许有人有更好的方法。
fill_na <- function(df, colname){
x<- 1
col <- as.character(colname)
dfcol <- df[as.character(colname)]
for(i in 1:nrow(dfcol)){
ifelse(is.na(dfcol[i,col]), {
df[i,col] = x
x <- x + 1
},
x <- 1)
}
return(df)
}
df_hope <- unsplit(lapply(split(df_raw, f = df_raw$Req_BU), fill_na, colname = "last_rec_date"), f = df_raw$Req_BU)
编辑:为测试方法做了更清晰的示例:
example_df <- structure(list(date = structure(c(17075, 17076, 17077, 17108,
17109, 17083, 17084, 17085, 17086, 17087), class = "Date"), Req_BU = c("12018",
"12018", "12018", "12018", "12018", "14004", "14004", "14004",
"14004", "14004"), last_rec_date = c(1L, 1L, 1L, NA, NA, NA,
NA, NA, 1L, 1L)), .Names = c("date", "Req_BU", "last_rec_date"
), row.names = c(1L, 2L, 3L, 4L, 5L, 31L, 32L, 33L, 34L, 35L), class = "data.frame")
> example_df
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 NA
5 2016-11-04 12018 NA
31 2016-10-09 14004 NA
32 2016-10-10 14004 NA
33 2016-10-11 14004 NA
34 2016-10-12 14004 1
35 2016-10-13 14004 1
从NA值越过“Req_BU”12018和14004之间的“边界”的数据帧开始,将该数据帧“Req_BU”值拆分为独立数据帧列表。然后,在使用lapply
返回单个数据框之前,使用unsplit
将上述函数应用于每个单独的数据框。
df_ex <- unsplit(lapply(split(example_df, f = example_df$Req_BU), fill_na, colname = "last_rec_date"), f = example_df$Req_BU)
> df_ex
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 2
31 2016-10-09 14004 1
32 2016-10-10 14004 2
33 2016-10-11 14004 3
34 2016-10-12 14004 1
35 2016-10-13 14004 1