在上一行中按多个条件填充NA

时间:2018-07-17 12:47:18

标签: r

我的数据框很大,其中包含交易。这些字段是ID(用户ID),时间间隔(从0->的整数),创建(交易日期),到期(订阅到期的日期)和订阅(“一年”或“两年”的字符) 我需要根据基于同一行或上一行的几种情况来修改到期时的缺失值。

df <- data.frame(id = id,
                 interval = interval,
                 creation = creation,
                 expiry = expiry,
                 subscription = subscription)
df <- df[order(df[, 1], df[, 3]),]

#loop all rows of ordered df (by subsID and payment date)
for (i in 2:nrow(df)) {
    # check NA of expiry
    if (is.na(df[i, 4])) { 
        #if previous row ID and interval match, we treat this as change to subscription
        if (df[i-1, 1] == df[i, 1] & df[i-1, 2] == df[i, 2]) {
            df[i, 4] <- df[i-1, 4]
        # otherwise it's one or two year new subscription so we add days to creation date
        } else if (df[i, 5] == "one year") {
            df[i, 4] <- df[i, 3] + 365
        } else if (df[i, 5] == "two years") {
            df[i, 4] <- df[i, 3] + 720
        }
     }
}

上面的代码可以解决这个问题,但是首先将NA保留为空,并且非常繁重,以至于要处理数百万行的数据帧需要很长时间。我该如何改善它并使它更像R?

1 个答案:

答案 0 :(得分:0)

我想它可能对您有帮助:

df <- data.frame(id = id,
                 interval = interval,
                 creation = creation,
                 expiry = expiry,
                 subscription = subscription)
df <- df[order(df[, 1], df[, 3]),]

library(dplyr)
df$match_previous <- (df[, 1] == lag(df[, 1]) & df[, 2] == lag(df[, 2]))
df$match_previous[1] <- FALSE
df[, 4] <- ifelse(!is.na(df[, 4]), 
                  df[, 4],
                  ifelse(df$match_previous,
                         lag(df[, 4]),
                         ifelse(df[, 5] == "one year",
                                df[, 3] + 365, df[, 3] + 730)))