无法通过data.table中的引用将列指定为.Date

时间:2013-04-14 07:17:20

标签: r data.table

我在使用Date时将新列指定为IDateby =。它创建了一个整数列,而不是预期的Date

require(data.table)
dt <- data.table(date = as.IDate(sample(10000:11000, 10), 
                                 origin = "1970-01-01"))
dt[, group := rep(1:2, 5)]
print(dt)

#           date group
#  1: 1997-06-12     1
#  2: 1998-02-19     2
#  3: 1998-04-25     1
#  4: 1998-01-27     2
#  5: 1997-10-29     1
#  6: 1998-05-08     2
#  7: 1999-05-09     1
#  8: 1999-06-26     2
#  9: 1997-11-01     1
# 10: 1997-07-19     2

这有效:

dt[, min.date := min(date)]
print(dt)

#           date group   min.date
#  1: 1997-06-12     1 1997-06-12
#  2: 1998-02-19     2 1997-06-12
#  3: 1998-04-25     1 1997-06-12
#  4: 1998-01-27     2 1997-06-12
#  5: 1997-10-29     1 1997-06-12
#  6: 1998-05-08     2 1997-06-12
#  7: 1999-05-09     1 1997-06-12
#  8: 1999-06-26     2 1997-06-12
#  9: 1997-11-01     1 1997-06-12
# 10: 1997-07-19     2 1997-06-12

但问题在于:

dt[, min.group.date := as.IDate(min(date)), by = group]
print(dt)

#           date group   min.date min.group.date
#  1: 1997-06-12     1 1997-06-12          10024
#  2: 1998-02-19     2 1997-06-12          10061
#  3: 1998-04-25     1 1997-06-12          10024
#  4: 1998-01-27     2 1997-06-12          10061
#  5: 1997-10-29     1 1997-06-12          10024
#  6: 1998-05-08     2 1997-06-12          10061
#  7: 1999-05-09     1 1997-06-12          10024
#  8: 1999-06-26     2 1997-06-12          10061
#  9: 1997-11-01     1 1997-06-12          10024
# 10: 1997-07-19     2 1997-06-12          10061

min.group.date是数字而非Date

dt[, class(min.group.date)]

# [1] "numeric"

如果我将列初始化为DateIDate,则按预期工作:

dt <- data.table(date = as.IDate(sample(10000:11000, 10), origin = "1970-01-01"))
dt[, group := rep(1:2, 5)]

dt[, min.group.date := as.IDate(NA)]
dt[, min.group.date := min(date), by = group]

dt[, class(min.group.date)]
# [1] "IDate" "Date"

1 个答案:

答案 0 :(得分:1)

保罗,如果你想要的是按最小日期分组,这一行就可以了:

dt[,min(date),by=group]

你应该看到(由于你的例子中的'sample'命令,下面的日期显然与你的不同):

   group         V1
1:     1 1997-11-19
2:     2 1997-12-04

如果您想查看每一行,您可以加入表格:

setkey(dt,group) #always good practice
dt_min=dt[,min(date),by=group]
setnames(dt_min,"V1","min.group.Date") #you should NOT use colnames (see help('setnames')
dt[dt_min]


    group       date min.group.Date
 1:     1 1999-01-30     1997-11-19
 2:     1 1999-11-27     1997-11-19
 3:     1 1999-11-11     1997-11-19
 4:     1 1997-11-19     1997-11-19
 5:     1 1999-05-06     1997-11-19
 6:     2 1999-07-11     1997-12-04
 7:     2 1997-12-04     1997-12-04
 8:     2 1998-07-28     1997-12-04
 9:     2 1998-10-23     1997-12-04
10:     2 1998-06-01     1997-12-04