需要获得R cummax但正确处理NAs

时间:2016-03-14 13:51:07

标签: r dplyr

我有一个这样的数据框:

dput(df1)
structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), y = c(16449L, NA, NA, 
16449L, 16450L, 16451L, NA, NA, 16455L, 16456L, NA, NA, 16756L, 
NA, 16460L, 16464L, 16469L, NA, NA, 16469L)), .Names = c("x", 
"y"), row.names = c(NA, -20L), class = "data.frame")

我需要改变y列,如下所示(使用dplyr):

df1 <- mutate(df1, y = ifelse(is.na(y), cummax(y), y))

但是,cummax不适合我的案件。如何通过某种替代方法获得相同的效果?

结果输出的NA行y应填充最后一个非NA值y。它们按顺序排列。

或者,我尝试了类似这样的东西,它不起作用:

mutate(df1, y = ifelse(is.na(y), max(y[1:row_number()], na.rm = TRUE), y)

因为row_number()本身是1到当前行的向量,所以它会产生错误。

编辑:所需的输出如下:

structure(list(x = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), y = c(16449, 16449, 
16449, 16449, 16450, 16451, 16451, 16451, 16455, 16456, 16456, 
16456, 16756, 16756, 16460, 16464, 16469, 16756, 16756, 16469
)), class = "data.frame", .Names = c("x", "y"), row.names = c(NA, 
-20L))

2 个答案:

答案 0 :(得分:3)

你可以这样做:

public void SaveCustomer(customer_table customer)
{
    if (customer.customerID == 0)
    {
        context.customer_table.Add(customer);
    }
    else 
    {
        customer_table dbEntry = context.customer_table.Find(customer.customerID);

        if (dbEntry != null)
        {
            dbEntry.customer_name = customer.customer_name;
            dbEntry.is_married = customer.is_married;
            dbEntry.cash_amount = customer.cash_amount;
            dbEntry.tax_calculated = customer.tax_calculated;
        }
    }
    context.SaveChanges();
}

或者您可以使用library(dplyr) v = cummax(ifelse(is.na(df1$y), -Inf, df1$y)) #A. Webb suggested -Inf instead of 0, great! mutate(df1, y=ifelse(is.na(y), v, y)) # x y #1 1 16449 #2 2 16449 #3 3 16449 #4 4 16449 #5 5 16450 #6 6 16451 #7 7 16451 #8 8 16451 #9 9 16455 #10 10 16456 #11 1 16456 #12 2 16456 #13 3 16756 #14 4 16756 #15 5 16460 #16 6 16464 #17 7 16469 #18 8 16756 #19 9 16756 #20 10 16469

data.table

答案 1 :(得分:2)

b,c,d将是另一种选择:

Reduce()