说我有一个看起来像这样的数据框:
playerID yearID salary
1 abbotje01 1998 175000
2 abbotje01 1999 255000
3 abbotje01 2000 255000
4 abbotje01 2001 300000
5 abbotku01 1993 109000
6 abbotku01 1994 109000
.
.
.
如何获取一个数据框架,为每个唯一的玩家ID分配最近一年的薪水,像这样:
playerID yearID salary
1 abbotje01 1998 300000
2 abbotje01 1999 300000
3 abbotje01 2000 300000
4 abbotje01 2001 300000
5 abbotku01 1993 109000
6 abbotku01 1994 109000
我想保留玩家ID的每个实例,而只是用相同的薪水重新分配每个实例
答案 0 :(得分:1)
按“ playerID”分组后,获取“ yearID”的max
值索引以提取与之对应的“工资”并用mutate
更新“工资”列
library(dplyr)
df1 %>%
group_by(playerID) %>%
mutate(salary = salary[which.max(yearID)])
# A tibble: 6 x 3
# Groups: playerID [2]
# playerID yearID salary
# <chr> <int> <int>
#1 abbotje01 1998 300000
#2 abbotje01 1999 300000
#3 abbotje01 2000 300000
#4 abbotje01 2001 300000
#5 abbotku01 1993 109000
#6 abbotku01 1994 109000
或使用data.table
library(data.table)
setDT(df1)[, salary := salary[which.max(yearID)], playerID]
df1 <- structure(list(playerID = c("abbotje01", "abbotje01", "abbotje01",
"abbotje01", "abbotku01", "abbotku01"), yearID = c(1998L, 1999L,
2000L, 2001L, 1993L, 1994L), salary = c(175000L, 255000L, 255000L,
300000L, 109000L, 109000L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
答案 1 :(得分:0)
我们可以基于order
yearID
数据帧,然后从每个组中提取最后一个salary
。
这可以在基数R中完成
df <- df[with(df, order(playerID, yearID)), ]
df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) x[length(x)]))
#Also
#df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) tail(x, 1)))
df
# playerID yearID salary final_salary
#1 abbotje01 1998 175000 300000
#2 abbotje01 1999 255000 300000
#3 abbotje01 2000 255000 300000
#4 abbotje01 2001 300000 300000
#5 abbotku01 1993 109000 109000
#6 abbotku01 1994 109000 109000
在dplyr
library(dplyr)
df %>%
arrange(playerID, yearID) %>%
group_by(playerID) %>%
mutate(final_salary = last(salary))
和data.table
library(data.table)
setDT(df)
df[order(yearID), final_salary := last(salary), playerID]