考虑data.table dt
:
id boro block date end_date
1: 1 1 1 01/01/1991 01/01/1992
2: 1 1 2 01/01/1991 01/01/1992
3: 1 2 3 01/01/1991 01/01/1992
4: 1 2 4 01/01/1991 NA
5: 2 1 1 01/01/1992 01/01/1993
6: 2 1 2 01/01/1992 01/01/1993
7: 2 2 3 01/01/1992 NA
8: 2 2 5 01/01/1992 NA
9: 3 1 1 01/01/1993 NA
10: 3 1 2 01/01/1993 NA
11: 3 2 6 01/01/1993 NA
12: 3 2 7 01/01/1993 NA
str(dt)
输出的地方:
Classes ‘data.table’ and 'data.frame': 12 obs. of 5 variables: $ id
$ id: num 1 1 1 1 2 2 2 2 3 3 ...
$ boro: num 1 1 2 2 1 1 2 2 1 1
$ block: num 1 2 3 4 1 2 3 5 1 2 ...
$ date: Date, format: "1991-01-01" "1991-01-01" "1991-01-01" "1991-01-01"...
$ end_date: Date, format: "1992-01-01" "1992-01-01" "1992-01-01" NA ...
- attr(*, ".internal.selfref")=<externalptr>
我正在尝试按date
和end_date
提供的日期范围扩展行。 IE,对于第一行,我想将其扩展为:
id boro block qtr
1: 1 1 1 1991-01-01
2: 1 1 1 1991-04-01
3: 1 1 1 1991-07-01
4: 1 1 1 1991-10-01
如果end_date
为NA,我想返回一行,其中包含字段id
,boro
,block
,以及对应于{{1}的四分之一}。 IE,对于第4行,返回
date
根据此处提出的类似问题的建议,我尝试使用:
id boro block qtr
1: 1 2 4 1991-01-01
但是我收到以下输出:
dt[,.(id,boro,block,qtr = seq(date, end_date, by = "quarter")),by = 1:nrow(dt)]
为了解决Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) :
'to' must be a finite number
可以为NA的事实,我尝试过:
end_date
但是由于未知原因,它输出:
dt[,ifelse(!(is.na(end_date)),
.(id,boro,block,qtr = seq(date, end_date, by = "quarter")),
.(id,boro,block,qtr = seq(date,date, by = "quarter"))),
by = 1:nrow(dt)]
注意:我的实际数据有1900万行和70列。因此效率很重要,因此要使用data.table。
答案 0 :(得分:2)
percentage = (CAST(bags_correct AS FLOAT) / CAST(total_bags AS FLOAT)) * 100
答案 1 :(得分:1)
以下是使用@ComponentScan
非等额联接的一种可能方法:
data.table
输出:
dtcols <- c("date", "end_date")
dt[, (dtcols) := lapply(.SD, as.Date, format="%m/%d/%Y"), .SDcols=dtcols]
#create the quarters
quarters <- dt[,.(qtr=seq(min(date), max(end_date, na.rm=TRUE), by="quarter"))]
#perform non-equi join and then handle NA end_date
quarters[dt, .(id, boro, block, x.qtr, i.date, i.end_date),
by=.EACHI, on=.(qtr>=date, qtr<end_date)][,
.(id, boro, block,
qtr=as.Date(ifelse(is.na(i.end_date), i.date, x.qtr), origin="1970-01-01"))]
数据:
id boro block qtr
1: 1 1 1 1991-01-01
2: 1 1 1 1991-04-01
3: 1 1 1 1991-07-01
4: 1 1 1 1991-10-01
5: 1 1 2 1991-01-01
6: 1 1 2 1991-04-01
7: 1 1 2 1991-07-01
8: 1 1 2 1991-10-01
9: 1 2 3 1991-01-01
10: 1 2 3 1991-04-01
11: 1 2 3 1991-07-01
12: 1 2 3 1991-10-01
13: 1 2 4 1991-01-01
14: 2 1 1 1992-01-01
15: 2 1 1 1992-04-01
16: 2 1 1 1992-07-01
17: 2 1 1 1992-10-01
18: 2 1 2 1992-01-01
19: 2 1 2 1992-04-01
20: 2 1 2 1992-07-01
21: 2 1 2 1992-10-01
22: 2 2 3 1992-01-01
23: 2 2 5 1992-01-01
24: 3 1 1 1993-01-01
25: 3 1 2 1993-01-01
26: 3 2 6 1993-01-01
27: 3 2 7 1993-01-01
id boro block qtr