此问题是here的延伸
如果我的数据有一个名为Remark
的列:
ID Name Type Date Amount Remark
1 AAAA First 2009/7/20 100 Not want
1 AAAA First 2010/2/3 200 want ya
2 BBBB First 2015/3/10 250
2 CCC Second 2009/2/23 300 good
2 CCC Second 2010/1/25 400 OK Right123
2 CCC Third 2015/4/9 500
2 CCC Third 2016/6/25 700 Stackoverflow is awesome
我希望我的结果在Date
最大时保持不变
首先,如果我不考虑列Remark
,我可以使用max()
来获取此信息:
dt[,.(Date = max(Date), Amount = sum(Amount)), by = .(ID, Name, Type)]
ID Name Type Date Amount
1: 1 AAAA First 2010-02-03 300
2: 2 BBBB First 2015-03-10 250
3: 2 CCC Second 2010-01-25 700
4: 2 CCC Third 2016-06-25 1200
但是,我如何保留备注。
ID Name Type Date Amount Remark
1: 1 AAAA First 2010-02-03 300 want ya
2: 2 BBBB First 2015-03-10 250
3: 2 CCC Second 2010-01-25 700 OK Right123
4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome
这是我的数据:
dt <- fread("
ID Name Type Date Amount Remark
1 AAAA First 2009/7/20 100 Not.want
1 AAAA First 2010/2/3 200 want.ya
2 BBBB First 2015/3/10 250
2 CCC Second 2009/2/23 300 good
2 CCC Second 2010/1/25 400 OK.Right123
2 CCC Third 2015/4/9 500
2 CCC Third 2016/6/25 700 Stackoverflow.is.awesome
")
dt$Date <- as.Date(dt$Date)
答案 0 :(得分:1)
我们可以使用join
setcolorder(dt[, setdiff(names(dt), "Amount"), with = FALSE][dt[, .(Date = max(Date),
Amount = sum(Amount)),
by = .(ID, Name, Type)], on = .(ID, Name, Type, Date)], names(dt))[]
# ID Name Type Date Amount Remark
#1: 1 AAAA First 2010-02-03 300 want ya
#2: 2 BBBB First 2015-03-10 250
#3: 2 CCC Second 2010-01-25 700 OK Right123
#4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome
或没有加入
dt1 <- dt[, c(Amount = sum(.SD[["Amount"]]), .SD[which.max(Date),
setdiff(names(.SD), "Amount"), with = FALSE]), .(ID, Name, Type)]
setcolorder(dt1, names(dt))
dt1
# ID Name Type Date Amount Remark
#1: 1 AAAA First 2010-02-03 300 want ya
#2: 2 BBBB First 2015-03-10 250
#3: 2 CCC Second 2010-01-25 700 OK Right123
#4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome
如果有更多“金额”列为sum
med
nm1 <- grep("Amount\\d*", names(dt), value = TRUE)
setcolorder(dt[, setdiff(names(dt), nm1), with = FALSE][dt[, c(Date= max(Date),
lapply(.SD, sum)), by = .(ID, Name, Type), .SDcols = nm1],
on = .(ID, Name, Type, Date)], names(dt))[]
答案 1 :(得分:1)
> df
ID Name Type Date Amount Remark
1: 1 AAAA First 03-02-2010 200 want ya
2: 2 CCC Third 09-04-2015 500
3: 2 BBBB First 10-03-2015 250
4: 1 AAAA First 20-07-2009 100 Not want
5: 2 CCC Second 23-02-2009 300 good
6: 2 CCC Second 25-01-2010 400 OK Right123
7: 2 CCC Third 25-06-2016 700 Stackoverflow is awesome
> df2=df[,.(Date = max(Date), Amount = sum(Amount)), by = .(ID, Name, Type)]
> df2
ID Name Type Date Amount
1: 2 BBBB First 10-03-2015 250
2: 1 AAAA First 20-07-2009 300
3: 2 CCC Second 25-01-2010 700
4: 2 CCC Third 25-06-2016 1200
> df[df2,]
ID Name Type Date Amount Remark i.ID i.Name i.Type i.Amount
1: 2 BBBB First 10-03-2015 250 2 BBBB First 250
2: 1 AAAA First 20-07-2009 100 Not want 1 AAAA First 300
3: 2 CCC Second 25-01-2010 400 OK Right123 2 CCC Second 700
4: 2 CCC Third 25-06-2016 700 Stackoverflow is awesome 2 CCC Third 1200
> df3=df[df2,c("ID","Name","Type","Date","Remark","i.Amount")]
> df3
ID Name Type Date Remark i.Amount
1: 2 BBBB First 10-03-2015 250
2: 1 AAAA First 20-07-2009 Not want 300
3: 2 CCC Second 25-01-2010 OK Right123 700
4: 2 CCC Third 25-06-2016 Stackoverflow is awesome 1200