我有这个数据表:
> str(merged.tables_t)
Classes ‘data.table’ and 'data.frame': 324326 obs. of 18 variables:
$ Store : int 2 2 2 2 2 2 2 2 2 2 ...
$ DayOfWeek : int 1 1 3 7 5 5 4 2 6 7 ...
$ Date : Factor w/ 942 levels "2013-01-01","2013-01-02",..: 903 315 366 832 298 214 395 491 908 384 ...
$ Sales : int 4123 4017 0 0 4524 4776 4214 5992 2404 0 ...
$ Customers : int 491 509 0 0 531 545 493 628 303 0 ...
$ Open : int 1 1 0 0 1 1 1 1 1 0 ...
$ Promo : int 0 0 0 0 1 1 0 1 0 0 ...
$ StateHoliday : Factor w/ 4 levels "0","a","b","c": 1 1 2 1 1 1 1 1 1 1 ...
$ SchoolHoliday : int 0 0 1 0 1 1 0 0 0 0 ...
$ StoreType : Factor w/ 4 levels "a","b","c","d": 1 1 1 1 1 1 1 1 1 1 ...
$ Assortment : Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 1 ...
$ CompetitionDistance : int 570 570 570 570 570 570 570 570 570 570 ...
$ CompetitionOpenSinceMonth: int 11 11 11 11 11 11 11 11 11 11 ...
$ CompetitionOpenSinceYear : int 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
$ Promo2 : int 1 1 1 1 1 1 1 1 1 1 ...
$ Promo2SinceWeek : int 13 13 13 13 13 13 13 13 13 13 ...
$ Promo2SinceYear : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
$ PromoInterval : Factor w/ 4 levels "","Feb,May,Aug,Nov",..: 3 3 3 3 3 3 3 3 3 3 ...
- attr(*, ".internal.selfref")=<externalptr>
我只需要创建新的变量来合并拖链列 CompetitionOpenSinceYear 和 CompetitionOpenSinceMonth 。
首先,我创建一个名为CompetitionDate的新变量
merged.tables_t[,"CompetitionDate"]<-NA
然后,我通过这个来修改这个变量的包含:
merged.tables_t[merged.tables_t[,19],as.character(as.Date(as.yearmon(with(merged.tables_t,sprintf("%d%02d",CompetitionOpenSinceYear,CompetitionOpenSinceMonth))))),]
它给了我这个错误:
[.data.table
中的错误(merged.tables_t ,,,CompetitionDate = as.character(as.Date(as.yearmon)(with(merged.tables_t,:unused 参数(CompetitionDate = as.character(as.Date(as.yearmon(与(merged.tables_t, sprintf(“%d-%02d”,CompetitionOpenSinceYear, CompetitionOpenSinceMonth))))))
请注意,当我使用data.frame时,我得到了所需的结果:
> merged.tables_d$CompetitionDate<-as.character(as.Date(as.yearmon(with(merged.tables_d,sprintf("%d-%02d",CompetitionOpenSinceYear,CompetitionOpenSinceMonth)))))
结果应该是这样的:
> head(merged.tables_d$CompetitionDate)
[1] "2007-11-01" "2007-11-01" "2007-11-01" "2007-11-01" "2007-11-01" "2007-11-01"
事实上,我需要使用data.table而不是data.frame,因为它在运行时间上更快。
如何使用data.table得到相同的结果? 提前谢谢你
答案 0 :(得分:1)
根据OP的反馈。添加了一个示例,使用data.table
连接两列(年份和月份)以形成日期类型newcol
。 OP的目的是使用as.yearmon
包中的zoo
。
library(data.table)
library(zoo)
# Data
dt <- data.table(CompetitionOpenSinceMonth = c(11, 11, 11, 11, 11, 11, 9, 10),
CompetitionOpenSinceYear = c(2007, 2007, 2007, 2007, 2007, 2007, 2006, 2006))
# Add another column using `:=` operator of data.table
dt[, newcol := as.yearmon(
sprintf("%4d-%2d", CompetitionOpenSinceYear, CompetitionOpenSinceMonth))]
# Modified data.table
dt
# CompetitionOpenSinceMonth CompetitionOpenSinceYear newcol
#1: 11 2007 Nov 2007
#2: 11 2007 Nov 2007
#3: 11 2007 Nov 2007
#4: 11 2007 Nov 2007
#5: 11 2007 Nov 2007
#6: 11 2007 Nov 2007
#7: 9 2006 Sep 2006
#8: 10 2006 Oct 2006