我一直试图通过查看其他帖子来查看我的数据,但我一直收到错误。我的数据new
如下所示:
id year name gdp
1 1980 Jamie 45
1 1981 Jamie 60
1 1982 Jamie 70
2 1990 Kate 40
2 1991 Kate 25
2 1992 Kate 67
3 1994 Joe 35
3 1995 Joe 78
3 1996 Joe 90
我想通过id选择年份值最高的行。所以想要的输出是:
id year name gdp
1 1982 Jamie 70
2 1992 Kate 67
3 1996 Joe 90
从Selecting Rows which contain daily max value in R开始,我尝试了以下操作,但无效
ddply(new,~id,function(x){x[which.max(new$year),]})
我也试过
tapply(new$year, new$id, max)
但这并没有给我想要的输出。
任何建议都会有所帮助!
答案 0 :(得分:3)
另一个适用于大型表的选项是使用data.table
。
DT <- read.table(text = "id year name gdp
1 1980 Jamie 45
1 1981 Jamie 60
1 1982 Jamie 70
2 1990 Kate 40
2 1991 Kate 25
2 1992 Kate 67
3 1994 Joe 35
3 1995 Joe 78
3 1996 Joe 90",
header = TRUE)
require("data.table")
DT <- as.data.table(DT)
setkey(DT,id,year)
res = DT[,j=list(year=year[which.max(gdp)]),by=id]
res
setkey(res,id,year)
DT[res]
# id year name gdp
# 1: 1 1982 Jamie 70
# 2: 2 1992 Kate 67
# 3: 3 1996 Joe 90
答案 1 :(得分:3)
ave
再次在这里工作,并将解释最多一年中有多行的情况。
new[with(new, year == ave(year,id,FUN=max) ),]
# id year name gdp
#3 1 1982 Jamie 70
#6 2 1992 Kate 67
#9 3 1996 Joe 90
答案 2 :(得分:2)
只需使用split
:
df <- do.call(rbind, lapply(split(df, df$id),
function(subdf) subdf[which.max(subdf$year)[1], ]))
例如,
df <- data.frame(id = rep(1:10, each = 3), year = round(runif(30,0,10)) + 1980, gdp = round(runif(30, 40, 70)))
print(head(df))
# id year gdp
# 1 1 1990 49
# 2 1 1981 47
# 3 1 1987 69
# 4 2 1985 57
# 5 2 1989 41
# 6 2 1988 54
df <- do.call(rbind, lapply(split(df, df$id), function(subdf) subdf[which.max(subdf$year)[1], ]))
print(head(df))
# id year gdp
# 1 1 1990 49
# 2 2 1989 41
# 3 3 1989 55
# 4 4 1988 62
# 5 5 1989 48
# 6 6 1990 41
答案 3 :(得分:2)
您可以使用duplicated
# your data
df <- read.table(text="id year name gdp
1 1980 Jamie 45
1 1981 Jamie 60
1 1982 Jamie 70
2 1990 Kate 40
2 1991 Kate 25
2 1992 Kate 67
3 1994 Joe 35
3 1995 Joe 78
3 1996 Joe 90" , header=TRUE)
# Sort by id and year (latest year is last for each id)
df <- df[order(df$id , df$year), ]
# Select the last row by id
df <- df[!duplicated(df$id, fromLast=TRUE), ]
答案 4 :(得分:1)
您的ddply努力对我来说很好,但您在回调函数中引用了原始数据集。
ddply(new,~id,function(x){x[which.max(new$year),]})
# should be
ddply(new,.(id),function(x){x[which.max(x$year),]})