无法操纵data.table

时间:2018-02-18 18:23:57

标签: r

我有这个数据表:

> str(merged.tables_t)
Classes ‘data.table’ and 'data.frame':  324326 obs. of  18 variables:
 $ Store                    : int  2 2 2 2 2 2 2 2 2 2 ...
 $ DayOfWeek                : int  1 1 3 7 5 5 4 2 6 7 ...
 $ Date                     : Factor w/ 942 levels "2013-01-01","2013-01-02",..: 903 315 366 832 298 214 395 491 908 384 ...
 $ Sales                    : int  4123 4017 0 0 4524 4776 4214 5992 2404 0 ...
 $ Customers                : int  491 509 0 0 531 545 493 628 303 0 ...
 $ Open                     : int  1 1 0 0 1 1 1 1 1 0 ...
 $ Promo                    : int  0 0 0 0 1 1 0 1 0 0 ...
 $ StateHoliday             : Factor w/ 4 levels "0","a","b","c": 1 1 2 1 1 1 1 1 1 1 ...
 $ SchoolHoliday            : int  0 0 1 0 1 1 0 0 0 0 ...
 $ StoreType                : Factor w/ 4 levels "a","b","c","d": 1 1 1 1 1 1 1 1 1 1 ...
 $ Assortment               : Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 1 ...
 $ CompetitionDistance      : int  570 570 570 570 570 570 570 570 570 570 ...
 $ CompetitionOpenSinceMonth: int  11 11 11 11 11 11 11 11 11 11 ...
 $ CompetitionOpenSinceYear : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
 $ Promo2                   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Promo2SinceWeek          : int  13 13 13 13 13 13 13 13 13 13 ...
 $ Promo2SinceYear          : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
 $ PromoInterval            : Factor w/ 4 levels "","Feb,May,Aug,Nov",..: 3 3 3 3 3 3 3 3 3 3 ...
 - attr(*, ".internal.selfref")=<externalptr> 

我只需要创建新的变量来合并拖链列 CompetitionOpenSinceYear CompetitionOpenSinceMonth

首先,我创建一个名为CompetitionDate的新变量

merged.tables_t[,"CompetitionDate"]<-NA

然后,我通过这个来修改这个变量的包含:

merged.tables_t[merged.tables_t[,19],as.character(as.Date(as.yearmon(with(merged.tables_t,sprintf("%d%02d",CompetitionOpenSinceYear,CompetitionOpenSinceMonth))))),]

它给了我这个错误:

  

[.data.table中的错误(merged.tables_t ,,,CompetitionDate =   as.character(as.Date(as.yearmon)(with(merged.tables_t,:unused   参数(CompetitionDate =   as.character(as.Date(as.yearmon(与(merged.tables_t,   sprintf(“%d-%02d”,CompetitionOpenSinceYear,   CompetitionOpenSinceMonth))))))

请注意,当我使用data.frame时,我得到了所需的结果:

> merged.tables_d$CompetitionDate<-as.character(as.Date(as.yearmon(with(merged.tables_d,sprintf("%d-%02d",CompetitionOpenSinceYear,CompetitionOpenSinceMonth)))))

结果应该是这样的:

> head(merged.tables_d$CompetitionDate)
[1] "2007-11-01" "2007-11-01" "2007-11-01" "2007-11-01" "2007-11-01" "2007-11-01"

事实上,我需要使用data.table而不是data.frame,因为它在运行时间上更快。

如何使用data.table得到相同的结果? 提前谢谢你

1 个答案:

答案 0 :(得分:1)

根据OP的反馈。添加了一个示例,使用data.table连接两列(年份和月份)以形成日期类型newcol。 OP的目的是使用as.yearmon包中的zoo

library(data.table)
library(zoo)

# Data 
dt <- data.table(CompetitionOpenSinceMonth = c(11, 11, 11, 11, 11, 11, 9, 10),
     CompetitionOpenSinceYear = c(2007, 2007,  2007, 2007, 2007, 2007, 2006, 2006))

# Add another column using `:=` operator of data.table
dt[, newcol := as.yearmon(
       sprintf("%4d-%2d", CompetitionOpenSinceYear, CompetitionOpenSinceMonth))]

# Modified data.table
dt
#  CompetitionOpenSinceMonth CompetitionOpenSinceYear   newcol
#1:                        11                     2007 Nov 2007
#2:                        11                     2007 Nov 2007
#3:                        11                     2007 Nov 2007
#4:                        11                     2007 Nov 2007
#5:                        11                     2007 Nov 2007
#6:                        11                     2007 Nov 2007
#7:                         9                     2006 Sep 2006
#8:                        10                     2006 Oct 2006