在以下数据框中
id<-c(1,1,1,1,1,3,3,3,3)
spent<-c(10,20,30,40,50,60,70,80,90)
date<-c("11-11-07","11-11-07","23-11-07","12-12-08","17-12-08","11-11-07","23-11-07","23- 11-07","16-01-08")
df<-data.frame(id,date,spent)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")
id date spent date2
1 1 11-11-07 10 2007-11-11
2 1 11-11-07 20 2007-11-11
3 1 23-11-07 30 2007-11-23
4 1 12-12-08 40 2008-12-12
5 1 17-12-08 50 2008-12-17
6 3 11-11-07 60 2007-11-11
7 3 23-11-07 70 2007-11-23
8 3 23-11-07 80 2007-11-23
9 3 16-01-08 90 2008-01-16
我需要在每天为每个spent
找到id
的最大值,并在单独的列中记录如下:
id date spent date2 sum.spent
1 1 11-11-07 10 2007-11-11 20
2 1 11-11-07 20 2007-11-11 20
3 1 23-11-07 30 2007-11-23 30
4 1 12-12-08 40 2008-12-12 40
5 1 17-12-08 50 2008-12-17 50
6 3 11-11-07 60 2007-11-11 60
7 3 23-11-07 70 2007-11-23 80
8 3 23-11-07 80 2007-11-23 80
9 3 16-01-08 90 2008-01-16 90
任何人都可以帮我吗?
答案 0 :(得分:5)
以下是使用ave()
的简单方法:
df$sum.spent <- ave(df$spent, df$id, df$date2, FUN = max)
df
# id date spent date2 sum.spent
# 1 1 11-11-07 10 2007-11-11 20
# 2 1 11-11-07 20 2007-11-11 20
# 3 1 23-11-07 30 2007-11-23 30
# 4 1 12-12-08 40 2008-12-12 40
# 5 1 17-12-08 50 2008-12-17 50
# 6 3 11-11-07 60 2007-11-11 60
# 7 3 23-11-07 70 2007-11-23 80
# 8 3 23-11-07 80 2007-11-23 80
# 9 3 16-01-08 90 2008-01-16 90
使用data.table()
:
library(data.table)
# data.table 1.8.2 For help type: help("data.table")
dfDT <- data.table(df, key="id,date2")
dfDT[, sum.spent:=max(spent), by=key(dfDT)]
# id date spent date2 sum.spent
# 1: 1 11-11-07 10 2007-11-11 20
# 2: 1 11-11-07 20 2007-11-11 20
# 3: 1 23-11-07 30 2007-11-23 30
# 4: 1 12-12-08 40 2008-12-12 40
# 5: 1 17-12-08 50 2008-12-17 50
# 6: 3 11-11-07 60 2007-11-11 60
# 7: 3 23-11-07 70 2007-11-23 80
# 8: 3 23-11-07 80 2007-11-23 80
# 9: 3 16-01-08 90 2008-01-16 90
答案 1 :(得分:4)
以下是您的plyr
答案:
library(plyr)
ddply(df, .(id, date), transform, sum.spent = max(spent))
这是data.table
答案(对于较大的数据集更好):
library(data.table)
df <- data.table(df)
df[, sum.spent:=max(spent), by = list(id, date)]