R:写入循环以用日期替换NULL

时间:2014-06-15 18:07:43

标签: r loops

以下是我的表格示例:

custID | StartDate | EndDate   | ReasonForEnd  | TransactionType | TransactionDate
    1a |  NULL     | 2/12/2014 | AccountClosed |  AccountOpened  |  1/15/2004
    1a |  NULL     | 2/12/2014 | AccountClosed |  Purchase       |  3/16/2004
    .......
    2b | 7/7/2011  | 6/14/2013 | AccountClosed | AccountOpened   |  8/1/2010

问题与StartDate字段有关。对于每个custId,如果条目为NULL,那么我想替换为TransactionDate TransactionType = AccountOpened。如果StartDate位于TransactionDate TransactionType = AccountOpened之后,则替换为此日期。

实际数据超过250,000行。我真的需要一些帮助来弄清楚如何在R中写这个。

1 个答案:

答案 0 :(得分:1)

您可以尝试以下操作,但我还没有测试过。我假设您的data.frame名为df

require(dplyr)

df %>%
    mutate_each(funs(as.Date(as.character(., format="%m/%d/%Y"))), 
                StartDate, EndDate, TransactionDate) %>%
    group_by(custID) %>%
    mutate(StartDate = ifelse(is.na(StartDate) | StartDate > TransactionDate[TransactionType == "AccountOpened"], 
                          TransactionDate[TransactionType == "AccountOpened"], StartDate))

此代码首先将多个列转换为Date格式(在此步骤中,NULL条目将转换为NA),按custID分组,然后检查StartDate是否NA 1}}或大于TransactionDate TransactionType == "AccountOpened",如果为TRUE,则将StartDate替换为TransactionDate,其中TransactionType == "AccountOpened"