Question

我尝试用lapply解决问题。作为一名SAS程序员，这种思维方式对我来说很新鲜。

library(data.table)
library(lubridate)

我有像这样的data.table

DT <- data.table(idnum= c(1001,1001,1001,1002,1002,1003,1003,1003,1003),
             a_beg= c(16079, 16700, 17000, 16074, 17000, 16074, 17000, 18081, 19000),
             a_end= c(16500, 16850, 22000, 16900, 22000, 16950, 18000, 18950, 21000))

a_beg和a_end包含sas-date编号（自1960-01-01以来的天数）

这是我的功能年。我想将我的函数应用于data.table对象，只保留与研究年度重叠的行

 years <- function(DT, year) {

 DT <- DT[lubridate::date('1960-01-01')+a_beg <= lubridate::ymd(paste(year, 1, 1, sep = "-"))
        & lubridate::date('1960-01-01')+a_end >= lubridate::ymd(paste(year, 12, 31, sep = "-")), ]
 DT
 }

没有申请就可以正常工作......

year2005 <- years(DT, 2005)

我想做这样的事...... 踩着学习年使用bind_rows和pipe进入data.table

 DT <- bind_rows(lapply(DT, 2004:2015, years())) %>% data.table()

我想使用迭代器作为函数的参数，我不知道如何。

Answer 1

我认为你想要的是

years <- function(year, DTbl) {
    #data.table changes by reference so you do not want your subset to overwrite the original DT
    DTbl[lubridate::date('1960-01-01')+a_beg <= lubridate::ymd(paste(year, 1, 1, sep = "-"))
            & lubridate::date('1960-01-01')+a_end >= lubridate::ymd(paste(year, 12, 31, sep = "-")), ]    
}
bind_rows(lapply(2004:2015, years, DTbl=DT)) %>% data.table()

或者，如果我们使用更多data.table语法，您可以执行data.table的非等联接，如下所示：

DT[, ':=' (
    a_beg = as.Date(a_beg, origin="1960-01-01"),
    a_end = as.Date(a_end, origin="1960-01-01")
)]

yearRanges <- data.table(beg=seq(as.Date("2004-01-01"), by="1 year", length.out=12), 
    end=seq(as.Date("2004-12-31"), by="1 year", length.out=12))

DT[yearRanges,
    .(YEAR=year(beg), idnum=x.idnum, a_beg=x.a_beg, a_end=x.a_end),
    on=.(a_beg <= beg, a_end >= end),
    allow.cartesian=TRUE]

lapply使用iterator作为函数的参数

1 个答案: