使用dplyr

时间:2019-02-08 20:47:22

标签: r date dplyr lapply

我有一个数据框架,其中包含主题列表和一组日期:

Subject    Date1       Date2       Date3      Date4      Date5     UniqueDate
001        12Mar02     03Apr02     08May02    09Jun02    22Jul02   02June02
002        15Feb05     03Mar05     18Apr05    01May05    16Jun05   22May05
...
100        22Jan09     01Feb09     28Mar09    10Apr09    21May09   29Jan09

我想找到UniqueDate大于的最后一个列名。因此,例如,Subject 001的结果应为Date3

我还没有一个可行的解决方案,但这是我现在试图使用的解决方案:

colnames(DF[, 2:5])[apply(DF,1,which.max(DF[i] - DF$UniqueDate)]

3 个答案:

答案 0 :(得分:0)

这是基本上使用整个tidyverse的一种解决方案:

library(dplyr)
library(tidyr)
library(purrr)
library(lubridate)

df %>% 
  nest(-Subject, -UniqueDate) %>% 
  mutate(latest_date = map2_chr(data, UniqueDate, ~ unlist(.x[max(which(dmy(.x) < dmy(.y)))])))

#> # A tibble: 3 x 4
#>   Subject UniqueDate data             latest_date
#>     <dbl> <chr>      <list>           <chr>      
#> 1       1 02June02   <tibble [1 x 5]> 08May02    
#> 2       2 22May05    <tibble [1 x 5]> 01May05    
#> 3     100 29Jan09    <tibble [1 x 5]> 22Jan09

最后一行有点混乱-希望您能看到这里发生了什么。

希望对此有一个基本的R解决方案。

数据

df <-
  tribble(~Subject,    ~Date1,       ~Date2,       ~Date3,      ~Date4,      ~Date5,     ~UniqueDate,
          001,        "12Mar02",     "03Apr02",     "08May02",    "09Jun02",    "22Jul02",   "02June02",
          002,        "15Feb05",     "03Mar05",     "18Apr05",    "01May05",    "16Jun05",   "22May05",
          100,        "22Jan09",     "01Feb09",     "28Mar09",    "10Apr09",    "21May09",   "29Jan09")

答案 1 :(得分:0)

使用data.frame:

d <- data.frame("Subject" = c("001", "002", "003"),
                "Date1" = c("12Mar02", "15Feb05", "22Jan09"),
                "Date2" = c("03Apr02", "03Mar05", "01Feb09"),
                "Date3" = c("08May02", "18Apr05", "28Mar09"),
                "Date4" = c("09Jun02", "01May05", "10Apr09"),
                "Date5" = c("22Jul02", "16Jun05", "21May09"),
                "UniqueDate" = c("02June02", "22May05", "29Jan09"))

首先,您要将日期列转换为R识别为日期的表单:

d[, 2:7] <- lapply(d[, 2:7], as.Date, format = "%d%b%y")

然后将所需的结果存储在名为result的新列中(注意:仅当Date1-Date4中的日期从最早到最新排列时,此方法才有效:

d$result <- apply(d, 1, function(x){
  sum(x["UniqueDate"] > x[2:6])
})

答案 2 :(得分:0)

为了完整起见,这也是一种在将数据重整为长格式后使用滚动连接的解决方案:

join()
library(data.table)
long <- melt(setDT(DT), "Subject")[
  , value := lubridate::dmy(value)][]
long[variable != "UniqueDate"][long[variable == "UniqueDate"], 
                               on = .(Subject, value), .(Subject, variable), roll = Inf]

数据

   Subject variable
1:       1    Date3
2:       2    Date4
3:     100    Date1